Red Hat Bugzilla – Bug 1004426
Rhevm Server System Memory Growth Concern
Last modified: 2016-02-10 14:35:15 EST
Rhevm 3.2 Server System Memory Growth Concern
Description of problem:
Over a 26 day period of running system / longevity tests with Rhevm, I'v been collecting my test environments system metrics while the system is running user admin and simulated client load. I noticed the memory usage on the Rhevm Server climbed initially form 21.5GB to 28.3GB over that time (26 days) which is approx 86% of the total. Over the past 6 day I began monitoring the libvirt watching it grow from VSZ(virt mem) 10.4GB to 11.8GB. I stopped all client load yesterday to watch and see if javas garbage collection would kick in and clean up, but the trend seemed to just level off. The test environment is currently in this condition and will remain in this state for a brief period of time in case developers wish to get access to it if needed.
I'm sure I could keep driving the system to a point where physical memory would become exshausted but then the system would most likely become unuseable for both myself and the developers. I have plenty of collected data of the test system in the environment to share if needed.
top - 14:59:40 up 26 days, 23:45, 1 user, load average: 0.00, 0.00, 0.00
Tasks: 420 total, 2 running, 418 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.1%us, 0.1%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 32862732k total, 28372068k used, 4490664k free, 253588k buffers
Swap: 16498680k total, 0k used, 16498680k free, 20768292k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31746 ovirt 20 0 11.3g 2.4g 18m S 1.2 7.6 1539:24 java
Version-Release number of selected component:
System Test Env:
-Red Hat Enterprise Virtualization Manager Version: 3.2.1-0.39.el6ev
-Qty 1 Rhel6.4, Rhevm Server, high end Dell PowerEdge R710 Dual 8core, 32GBRam, rhevm-3.2.1-0.39.el6ev.noarch
-Qty 4 Rhel6.4, Hosts all high end Dell, PowerEdge R710 Dual 8core, 16GBRam
-Qty 1 Rhel6.4, Ipa Directory Server
-Qty 3 Rhel6.4, Load Client machines to dive user simulated load.
How reproducible: This is the first run
Steps to Reproduce:
1.Run System test load against the system for an extended period of time
Memory growth trending upward
Sustainable memory management with continued system operation and functionality without interuptions
I have running a 30 day test using Rhevm 3.2.
Type : System / Longevity
Target Duration : 30 days
Current Duration: 26 days / Run 1
Total 46 Vms created
-ISCSI Total 500G
-Name Type Storage Format Cross Data-Center-Status FreeSpace
-ISCIMainStorage Data (Master) iSCSI V3 Active 263 GB
Data collection / monitoring:
All systems being monitored for uptime, memory, swap, cpu, networkio, diskio and disk space during the test run. (except for the IPA Server/ Clients) Tests here are to simulate client load but not stress the systems out. The idea is to simulate 1 years of activity in 30 days while being monitored for system reliabiliy and continued admin and user functionality.
System Test Load:
1. VM_Crud client, A python multithread client using the sdk to cycles through a crud flow of VM(s) over a period of time defined by the tester to drive load against the system (10 threads)
2. VM_Migration client, A python multithread client using the sdk to cycles through migrating running vms from host to host in the test environment over a period of time defined by the tester to drive load against the system (2 threads)
3. VM_Cycling client, A python multithread client using the sdk to cycles through a rnd run, suspend, stop of existing VM(s) in the test environment over a period of time defined by the tester to drive load against the system (10 threads)
4. UserPortal client, A python multithread client using selenium to drive the User Portal. The client cycles through unique users to run, stop or start console remote-viewer of existing VM(s) in the test environment over a period of time defined by the tester to drive load against the system (10 threads)
If I understand correctly you have a RHEV-M machine with 32 GiB of RAM and according to the output of top the RHEV-M process is only consuming 11.3 GiB of virtual space and 2.4 GiB of real RAM. The amount of virtual space is not really relevant, and the real amount of RAM is reasonable. Take into account that the Java virtual machine used by the RHEV-M process is configured to use up to 1 GiB of heap space by default. That plus the stacks of the threads account for most of those 2.4 GiB of real RAM. So I would say that this is normal, not alarming at all.
What is probably alarming you is the total amount of RAM in use reported by top, those 28 GiB. But you have to take into account that most of that space is used by the file system cache, approx 20 GiB. This is normal as well: the kernel tries to use all the available memory, if not needed for other thing it uses it for the file system cache. So in the long term the machine should use all the available memory, until all the file system is loaded in RAM, that is perfectly healthy.
I would say that this is probably an indication that the machine has too much memory. I would suggest to reduce it, maybe to 4 GiB instead of 32 GiB. This is the minimum required by RHEV-M (and even that is probably too large) so if you test that you will be testing what our more demanding/constrained customers will do.
I would also suggest to activate garbage collection logging in the RHEV-M Java virtual machine, so if there are problems in that area in the future we can analyze them. To do that add the following to /etc/sysconfig/ovirt-engine:
Then restart RHEV-M:
# service ovirt-engine restart
It will then start to produce garbage collection debug information in /var/log/ovirt-engine/console.log.
Sorry, obviously it should be:
Great thanks for the good information you have relayed;
I'll take into account and locate a server with 4GB to run the next test run. Agreed probably better to test with the minimum required RAM specified by Redhat docs. I'll activate the GB collection logs as specified going forward.
Also in the future runs I'll also monitor the ovirt java process as well from the start and java heap memory too. Will be starting a new longevity test run soon.
I've also attached a libre office calc spread sheet on the trend I observed and was concerned about initially. (rhevm32_run1_system_memory_20days.ods)
Created attachment 795735 [details]
Rhevm System Memory Trend - run1
I'm closing this bug. If this concern reappears in the future please reopen.