Red Hat Bugzilla – Bug 1306349
corosync memory footprint increases on every node rejoin
Last modified: 2016-11-04 02:49:34 EDT
Created attachment 1122838 [details] Proposed patch totempg: Fix memory leak Previously there were two free lists. One for operational and one for transitional state. Because every node starts in transitional state and always ends in the operational state, assembly was always put to normal state free list and never in transitional free list, so new assembly structure was always allocated after new node connected. Solution is to have only one free list.
*** Bug 1309806 has been marked as a duplicate of this bug. ***
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune@redhat.com with any questions
While looping: systemctl start corosync.service; sleep 2; systemctl stop corosync.service; sleep 2 corosync on the non-looping node: # cat /proc/`pgrep corosync`/status | grep RSS Start: VmRSS: 4268 kB after 10 cycles: VmRSS: 5484 kB after 50 cycles: VmRSS: 6980 kB after next 50 (160 total): VmRSS: 8784 kB Unlike in RHEL6 this does not stop after couple of cycles. It goes like this, even after hundreds of cycles. Example of 50 run: 01 VmRSS: 5388 kB 02 VmRSS: 5388 kB 03 VmRSS: 5388 kB 04 VmRSS: 5388 kB 05 VmRSS: 5388 kB 06 VmRSS: 5648 kB 07 VmRSS: 5716 kB 08 VmRSS: 5716 kB 09 VmRSS: 5716 kB 10 VmRSS: 5716 kB 11 VmRSS: 5716 kB 12 VmRSS: 5716 kB 13 VmRSS: 5716 kB 14 VmRSS: 5976 kB 15 VmRSS: 5976 kB 16 VmRSS: 5976 kB 17 VmRSS: 6108 kB 18 VmRSS: 6108 kB 19 VmRSS: 6108 kB 20 VmRSS: 6108 kB 21 VmRSS: 6108 kB 22 VmRSS: 6108 kB 23 VmRSS: 6108 kB 24 VmRSS: 6368 kB 25 VmRSS: 6368 kB 26 VmRSS: 6368 kB 27 VmRSS: 6504 kB 28 VmRSS: 6504 kB 29 VmRSS: 6504 kB 30 VmRSS: 6504 kB 31 VmRSS: 6504 kB 32 VmRSS: 6504 kB 33 VmRSS: 6504 kB 34 VmRSS: 6764 kB 35 VmRSS: 6764 kB 36 VmRSS: 6764 kB 37 VmRSS: 6904 kB 38 VmRSS: 6904 kB 39 VmRSS: 6904 kB 40 VmRSS: 6904 kB 41 VmRSS: 6904 kB 42 VmRSS: 6904 kB 43 VmRSS: 6904 kB 44 VmRSS: 7164 kB 45 VmRSS: 7164 kB 46 VmRSS: 7164 kB 47 VmRSS: 7280 kB 48 VmRSS: 7280 kB 49 VmRSS: 7280 kB 50 VmRSS: 7280 kB As you can see, it's steadily increasing, just not with every rejoin, but like every 5 or so. I had only corosync running on both nodes (no pacemaker), started via systemctl. corosync.conf: totem { version: 2 secauth: off cluster_name: STSRHTS15446 transport: udpu } nodelist { node { ring0_addr: virt-053 nodeid: 1 } node { ring0_addr: virt-054 nodeid: 2 } } quorum { provider: corosync_votequorum two_node: 1 } logging { to_logfile: yes logfile: /var/log/cluster/corosync.log to_syslog: yes } corosynclib-2.4.0-4.el7.x86_64 libqb-1.0-1.el7.x86_64 corosync-2.4.0-4.el7.x86_64
@jkortus: Your numbers looks really interesting. I've tested exactly same versions of packages on RHEL 7.2. I've executed corosync on first node and it's memory usage right after startup was: cat /proc/`pgrep corosync`/status | grep RSS VmRSS: 33276 kB on the second node I've executed: for i in `seq 1 200`;do systemctl start corosync.service; sleep 2; systemctl stop corosync.service; sleep 2;done Memory usage on first node after all cycle finished: cat /proc/`pgrep corosync`/status | grep RSS VmRSS: 33276 kB Installed packages: corosync-2.4.0-4.el7.x86_64 corosynclib-2.4.0-4.el7.x86_64 libqb-1.0-1.el7.x86_64 So conclusion: - Your startup memory usage looks like way smaller (maybe some change in glibc memory allocation) - I'm not able to reproduce the bug Can you please try your test with: - RHEL 7.2 - With RHEL 7 compose and retry more cycles (till memory grows to ~33Mb) and report what happens afterwards ?
I can reproduce this on my RHEL7.2 VMs, it behaves just as Jaroslav's cluster does, starting at 3988 kB and increasing. libqb-1.0-1.el7.x86_64 glibc-2.17-106.el7_2.8.x86_64 corosync-2.4.0-4.el7.x86_64
@Chrissie, yep I'm now too. First run I was running corosync -f on first node. Without foreground option behavior is as described.
I've tried running test without -f and second node running: for i in `seq 1 20000`;do systemctl start corosync.service; sleep 0.1; systemctl stop corosync.service; sleep 0.1;done to speed things up. Result is memory grow to VmRSS: 12124 kB after less than 1000 cycles and then memory stays on the same number for rest of the test. So to conclude. I believe original bug is fixed (without fix memory would grow thru whole test) and described behavior is standard behavior of glibc not freeing all memory (this is expected). It's probably result of cmap (so libqb trie) or logging. Moving back to on_qa.
thanks to both of you for a quick response! I'll run it again and see if it stabilizes on ~12M.
Longer run: 00001 VmRSS: 5044 kB ... 00215 VmRSS: 13212 kB 00615 VmRSS: 13212 kB ... 01570 VmRSS: 13224 kB ... 02334 VmRSS: 13224 kB Looks good even after 2K+ rejoins. Marking as verified with corosync-2.4.0-4.el7.x86_64.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-2463.html