Also see my comments in what is perhaps a related bug: https://bugzilla.redhat.com/show_bug.cgi?id=1458238#c9 Our all-in-one (no HA, no cluster, local storage) implementation of OpenShift Standalone Registry is seeing what seems like abnormally high RAM usage of the origin-master process. === # oc version oc v1.5.1 kubernetes v1.5.2+43a9be4 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://192.168.1.17:8443 openshift v1.5.1 kubernetes v1.5.2+43a9be4 # ps aux | grep -e "openshift start master" -e USER USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 14240 3.7 74.0 19072488 5935176 ? Ssl Aug15 935:21 /usr/bin/openshift start master --config=/etc/origin/master/master-config.yaml --loglevel=2 # docker info Containers: 12 Running: 6 Paused: 0 Stopped: 6 Images: 9 Server Version: 1.12.6 Storage Driver: overlay2 Backing Filesystem: xfs Logging Driver: journald Cgroup Driver: systemd Plugins: Volume: local Network: host bridge null overlay Swarm: inactive Runtimes: docker-runc runc Default Runtime: docker-runc Security Options: seccomp selinux Kernel Version: 3.10.0-514.21.1.el7.x86_64 Operating System: CentOS Linux 7 (Core) OSType: linux Architecture: x86_64 Number of Docker Hooks: 2 CPUs: 4 Total Memory: 7.639 GiB Name: registry.rdoproject.org.rdocloud ID: HBS7:WFL6:5QFN:KM7M:TTSD:I557:J565:TVEJ:C6CE:ZEL2:GJCY:FPZQ Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): false Registry: https://index.docker.io/v1/ WARNING: bridge-nf-call-iptables is disabled WARNING: bridge-nf-call-ip6tables is disabled Insecure Registries: 172.30.0.0/16 127.0.0.0/8 Registries: docker.io (secure) === It is easily consuming over 6GB of RAM by itself, making a 8GB RAM node swap well into 1.5GB of swap territory. Restarting origin-master frees up the memory but it eventually goes back up again -- output from our monitoring this morning (yes, within 15 minutes): 09:15:28 CheckRAM OK: 41.23% RAM free 09:31:22 CheckRAM WARNING: 9.4% RAM free We used openshift-ansible to deploy this all-in-one node, the code to deploy it is available here [1] and the variables passed to the openshift roles are here [2]. If I had to take a guess, this memory usage might be caused by a significant amount of subsequent "docker push" operations (~100 images) after building a new batch of images. [1]: https://github.com/rdo-infra/rdo-container-registry [2]: https://github.com/rdo-infra/rdo-container-registry/blob/master/group_vars/OSEv3.yml
Created attachment 1323237 [details] Heap shortly after restarting origin-master
Created attachment 1323238 [details] Heap once origin-master is using most of the available RAM
Created attachment 1323239 [details] origin-master RAM usage screenshot
Created attachment 1323240 [details] origin-master logs
From the master log, it seems like you are getting a LOT of images. Can you please check how many images you have via `oc get images | wc -l` (as admin)? If there are many images, seems like you need to run prunning?
How often you run the 100 builds? If you never prune, it means OpenShift is keeping record about every image you ever build. There is default "re-list" interval (which is ~15 minutes) that will load all images into a cache for the API server. That might explain why the memory usage went up after 15 minutes. What I can recommend is setup really aggressive pruning so you get rid of images that are not referenced by any image stream. That should free up the cache and cause the memory usage go down.
As per discussed, a namedspaced 'oc get images -n <project> | wc -l' returned 39469. We build those images several times a day as part of a continuous testing and integration pipeline. I'll read the documentation on pruning and see if it helps.
FWIW memory usage on OpenShift 3.7 seems much, much better. Some stats at a quick glance so far of this new deployment: - 10619 images - 8203 tags - 248 image streams - 341GB disk space used # free -m total used free shared buff/cache available Mem: 7822 3026 390 1 4405 4425 Swap: 0 0 0
Seems to be fine since 3.7. Closing.