Description of problem: Create two replica 3 volumes and enable gluster-block profile on it. On machine-1 execute: for i in {1..200}; do gluster-block create vol1/block1 ha 3 192.168.122.61,192.168.122.123,192.168.122.113 1GiB && gluster-block delete vol1/block1; done On machine-2 execute: for i in {1..200}; do gluster-block create vol2/block2 ha 3 192.168.122.61,192.168.122.123,192.168.122.113 1GiB && gluster-block delete vol2/block2; done After 40 minutes gluster-blockd on 2 machines died of OOM killer [ 5941.594297] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name [ 5941.594302] [ 520] 0 520 15454 46 31 3 100 0 systemd-journal [ 5941.594303] [ 550] 0 550 11854 0 26 3 574 -1000 systemd-udevd [ 5941.594304] [ 661] 0 661 13888 6 28 3 102 -1000 auditd [ 5941.594305] [ 671] 0 671 21136 0 13 3 54 0 audispd [ 5941.594306] [ 673] 0 673 10907 0 27 3 86 0 sedispatch [ 5941.594307] [ 686] 70 686 12547 0 29 3 104 0 avahi-daemon [ 5941.594308] [ 687] 0 687 6425 0 17 3 69 0 qemu-ga [ 5941.594309] [ 688] 0 688 1104 0 7 3 28 0 rngd [ 5941.594310] [ 689] 0 689 12545 19 28 3 414 0 systemd-logind [ 5941.594311] [ 691] 0 691 1642 1 9 3 29 0 mcelog [ 5941.594312] [ 692] 172 692 46710 0 28 4 86 0 rtkit-daemon [ 5941.594313] [ 693] 70 693 12547 0 27 3 85 0 avahi-daemon [ 5941.594313] [ 694] 0 694 4220 1 15 3 50 0 alsactl [ 5941.594314] [ 698] 81 698 14287 0 27 3 328 -900 dbus-daemon [ 5941.594315] [ 700] 0 700 51153 0 34 3 126 0 gssproxy [ 5941.594316] [ 709] 0 709 130329 2 102 4 5261 0 firewalld [ 5941.594317] [ 710] 0 710 103902 0 70 3 442 0 ModemManager [ 5941.594318] [ 712] 0 712 98766 0 44 4 212 0 accounts-daemon [ 5941.594319] [ 726] 0 726 10025 0 22 3 69 0 spice-vdagentd [ 5941.594320] [ 731] 998 731 133327 0 58 4 1407 0 polkitd [ 5941.594321] [ 739] 0 739 111762 0 66 3 358 0 abrtd [ 5941.594322] [ 741] 996 741 25760 12 21 3 86 0 chronyd [ 5941.594322] [ 761] 0 761 132237 22 163 3 312 0 abrt-dump-journ [ 5941.594323] [ 763] 0 763 132236 19 153 5 308 0 abrt-dump-journ [ 5941.594324] [ 774] 0 774 158221 240 86 4 378 0 NetworkManager [ 5941.594325] [ 797] 0 797 20785 1 43 4 212 -1000 sshd [ 5941.594326] [ 801] 0 801 6490 0 19 3 51 0 atd [ 5941.594327] [ 804] 0 804 33233 21 17 3 137 0 crond [ 5941.594328] [ 805] 0 805 102391 0 47 3 279 0 gdm [ 5941.594329] [ 899] 0 899 92978 1 64 3 302 0 gdm-session-wor [ 5941.594330] [ 913] 42 913 16494 1 36 3 257 0 systemd [ 5941.594330] [ 918] 0 918 21781 1 46 3 3074 0 dhclient [ 5941.594331] [ 932] 42 932 24738 0 49 3 748 0 (sd-pam) [ 5941.594332] [ 957] 42 957 112433 1 95 3 351 0 gdm-wayland-ses [ 5941.594333] [ 964] 42 964 14144 1 27 3 163 0 dbus-daemon [ 5941.594334] [ 967] 42 967 172779 0 113 4 466 0 gnome-session-b [ 5941.594335] [ 976] 42 976 407794 5908 338 4 22976 0 gnome-shell [ 5941.594336] [ 994] 0 994 107134 0 54 3 308 0 upowerd [ 5941.594337] [ 1010] 42 1010 50712 18 97 3 1929 0 Xwayland [ 5941.594338] [ 1012] 42 1012 86186 0 37 3 175 0 at-spi-bus-laun [ 5941.594338] [ 1017] 42 1017 14074 0 27 3 109 0 dbus-daemon [ 5941.594339] [ 1020] 42 1020 55841 0 44 3 195 0 at-spi2-registr [ 5941.594340] [ 1026] 42 1026 164909 1 86 3 685 0 pulseaudio [ 5941.594341] [ 1039] 42 1039 114957 22 42 5 578 0 ibus-daemon [ 5941.594342] [ 1042] 42 1042 95605 0 36 3 163 0 ibus-dconf [ 5941.594343] [ 1045] 42 1045 140030 1 152 3 2053 0 ibus-x11 [ 5941.594343] [ 1053] 42 1053 109779 0 61 3 326 0 xdg-permission- [ 5941.594344] [ 1062] 0 1062 16544 13 34 3 158 0 wpa_supplicant [ 5941.594345] [ 1063] 0 1063 244587 19 226 4 53465 0 packagekitd [ 5941.594346] [ 1070] 42 1070 258935 69 185 4 1590 0 gnome-settings- [ 5941.594347] [ 1072] 42 1072 10338 0 25 3 97 0 spice-vdagent [ 5941.594348] [ 1090] 995 1090 103899 0 52 4 726 0 colord [ 5941.594349] [ 1103] 42 1103 77156 0 35 3 161 0 ibus-engine-sim [ 5941.594349] [ 1175] 0 1175 15963 1 34 3 238 0 systemd [ 5941.594350] [ 1192] 0 1192 44699 0 51 3 775 0 (sd-pam) [ 5941.594351] [ 5123] 0 5123 37727 1 74 3 320 0 sshd [ 5941.594352] [ 5129] 0 5129 37727 0 72 3 337 0 sshd [ 5941.594353] [ 5142] 0 5142 30958 1 15 3 531 0 bash [ 5941.594354] [ 5680] 0 5680 5369979246 88629 1330 130 77195 0 tcmu-runner [ 5941.594355] [ 6078] 0 6078 5368816741 1378 296 120 51789 0 glusterd [ 5941.594356] [ 6210] 32 6210 14326 0 31 3 124 0 rpcbind [ 5941.594356] [ 6212] 0 6212 5370501607 197078 2154 133 215162 0 gluster-blockd [ 5941.594357] [ 6215] 0 6215 37727 6 76 3 313 0 sshd [ 5941.594358] [ 6224] 0 6224 37727 3 75 3 319 0 sshd [ 5941.594359] [ 6236] 0 6236 30731 30 15 3 273 0 bash [ 5941.594360] [ 6275] 0 6275 5368838850 94596 469 126 28895 0 glusterfsd [ 5941.594360] [ 6327] 0 6327 5368833902 76039 457 125 41244 0 glusterfsd [ 5941.594361] [ 6351] 0 6351 5368803299 1193 185 103 5097 0 glusterfs [ 5941.594362] Out of memory: Kill process 6212 (gluster-blockd) score 387 or sacrifice child [ 5941.594371] Killed process 6212 (gluster-blockd) total-vm:21482006428kB, anon-rss:788312kB, file-rss:0kB, shmem-rss:0kB Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
I think we need to get the chaching part done even for gluster-blockd for fixing this. I was trying to see If there would be any races that we would find by testing this. But instead it found the OOM killer. IMO we should implement caching soon.
Here are the patches open for review: https://review.gluster.org/#/c/17440/ https://review.gluster.org/#/c/17441/
Cache functionality has already been validated in bz 1464421. Changed the cache size to non-default value by setting the same in /etc/sysconfig/gluster-blockd and it showed the expected behaviour. Also, the actual issue of mem-leak is being tracked separately in bz 1196020. Moving this bug to verified in 3.3.0. Logs are pasted below. [root@dhcp47-121 ~]# vim /etc/sysconfig/gluster-blockd >> set the cache limit to '4' [root@dhcp47-121 ~]# vim /etc/sysconfig/gluster-blockd [root@dhcp47-121 ~]# systemctl restart gluster-blockd [root@dhcp47-121 ~]# [root@dhcp47-121 ~]# systemctl status gluster-blockd ● gluster-blockd.service - Gluster block storage utility Loaded: loaded (/usr/lib/systemd/system/gluster-blockd.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2017-08-01 11:31:46 EDT; 4s ago Main PID: 18230 (gluster-blockd) CGroup: /system.slice/gluster-blockd.service └─18230 /usr/sbin/gluster-blockd --glfs-lru-count 4 --log-level INFO Aug 01 11:31:46 dhcp47-121.lab.eng.blr.redhat.com systemd[1]: Started Gluster bl... Aug 01 11:31:46 dhcp47-121.lab.eng.blr.redhat.com systemd[1]: Starting Gluster b... Hint: Some lines were ellipsized, use -l to show in full. [root@dhcp47-121 ~]# [root@dhcp47-121 ~]# gluster-block list vol1 bk1 bk2 bk3 bk4 [root@dhcp47-121 ~]# time gluster-block list vol2 bk1 bk2 real 0m4.235s user 0m0.003s sys 0m0.014s [root@dhcp47-121 ~]# time gluster-block list vol2 bk1 bk2 real 0m0.031s user 0m0.006s sys 0m0.001s [root@dhcp47-121 ~]# time gluster-block list vol3 bk1 bk2 real 0m3.117s user 0m0.003s sys 0m0.004s [root@dhcp47-121 ~]# time gluster-block list vol1 bk1 bk2 bk3 bk4 real 0m0.241s user 0m0.002s sys 0m0.003s [root@dhcp47-121 ~]# time gluster-block list vol2 bk1 bk2 real 0m0.032s user 0m0.003s sys 0m0.004s [root@dhcp47-121 ~]# time gluster-block list vol3 bk1 bk2 real 0m0.028s user 0m0.002s sys 0m0.005s [root@dhcp47-121 ~]# time gluster-block list vol4 bk1 bk2 real 0m3.504s user 0m0.001s sys 0m0.004s [root@dhcp47-121 ~]# time gluster-block list vol1 bk1 bk2 bk3 bk4 real 0m0.023s user 0m0.003s sys 0m0.001s [root@dhcp47-121 ~]# time gluster-block list vol2 bk1 bk2 real 0m0.030s user 0m0.003s sys 0m0.005s [root@dhcp47-121 ~]# time gluster-block list vol3 bk1 bk2 real 0m0.089s user 0m0.002s sys 0m0.004s [root@dhcp47-121 ~]# time gluster-block list vol4 bk1 bk2 real 0m0.027s user 0m0.001s sys 0m0.005s [root@dhcp47-121 ~]# time gluster-block list vol5 bk1 bk2 real 0m4.123s user 0m0.004s sys 0m0.006s [root@dhcp47-121 ~]# time gluster-block list vol1 bk1 bk2 bk3 bk4 real 0m4.758s user 0m0.004s sys 0m0.004s [root@dhcp47-121 ~]# time gluster-block list vol3 bk1 bk2 real 0m0.038s user 0m0.004s sys 0m0.010s [root@dhcp47-121 ~]# time gluster-block list vol4 bk1 bk2 real 0m0.032s user 0m0.001s sys 0m0.006s [root@dhcp47-121 ~]# time gluster-block list vol5 bk1 bk2 real 0m0.024s user 0m0.001s sys 0m0.003s [root@dhcp47-121 ~]# time gluster-block list vol2 bk1 bk2 real 0m3.971s user 0m0.003s sys 0m0.008s [root@dhcp47-121 ~]#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:2773