Bug 1371985
| Summary: | Possible memory leak in openshift master process during reliability long run | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Vikas Laad <vlaad> | ||||||||||||
| Component: | Node | Assignee: | Paul Morie <pmorie> | ||||||||||||
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | DeShuai Ma <dma> | ||||||||||||
| Severity: | medium | Docs Contact: | |||||||||||||
| Priority: | medium | ||||||||||||||
| Version: | 3.3.0 | CC: | agoldste, aos-bugs, decarr, jeder, jokerman, mifiedle, mmccomas, pep, tschan+redhat, tstclair | ||||||||||||
| Target Milestone: | --- | ||||||||||||||
| Target Release: | --- | ||||||||||||||
| Hardware: | Unspecified | ||||||||||||||
| OS: | Unspecified | ||||||||||||||
| Whiteboard: | |||||||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||
| Clone Of: | Environment: | ||||||||||||||
| Last Closed: | 2016-10-26 18:01:59 UTC | Type: | Bug | ||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||
| Documentation: | --- | CRM: | |||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
| Embargoed: | |||||||||||||||
| Attachments: |
|
||||||||||||||
|
Description
Vikas Laad
2016-08-31 15:29:11 UTC
Created attachment 1196445 [details]
pprof heap
Created attachment 1196446 [details]
pprof heap for another time
Vikas -- if you use stress to allocate a bunch of RAM and put the system under (gentle) memory pressure, does the RSS of the master go down? https://copr-be.cloud.fedoraproject.org/results/ndokos/pbench/epel-7-x86_64/00182790-pbench-stress/ After further analysis, we are not seeing continued rss growth. rss usage stabilized and in future can be tuned further with etcd caching for origin specific objects when we revisit https://github.com/openshift/origin/pull/10719 Created attachment 1199116 [details]
mem growth
I am still seeing the memory growth, graph attached shows memory consumed by only openshift-master process. Sample has taken every 15 mins in last few days.
This master is running with etcd embedded -- can we do this with an external etcd? It will be far, far easier to diagnose that way. Started another run with external etcd, will update the bug again with some data. I had another cluster with same issue, after stopping the tests and deleting all the projects Memory usage on nodes came down. But the master node and master process was still taking up same memory (on this cluster it was at 2.3G). Then I ran pbench stress mentioned in comment #3, that did not make any difference. Even after forcing OOM on master using stress also did not make rss of master go down. Another note, after restarting the master process mem consumption came down to 285MB. Created attachment 1206900 [details]
Openshift master memory
Attached graph which shows the memory growth after running the tests for 22 days, growth is still there but its slow right now.
|