Description of problem: Ganesha+Tiering: ganesha_grace invoked oom-killer observed while removal of files is going on tiered volume Version-Release number of selected component (if applicable): glusterfs-3.7.9-1 nfs-ganesha-2.3.1-3 How reproducible: Once Steps to Reproduce: 1.Create a 4 node cluster and configure ganesha on the cluster 2.Create a tiered volume, attach tier, enable quota on the volume. Volume Name: tiervolume Type: Tier Volume ID: 32d2eaf1-7a5b-4d39-8ec8-27bdb9bee4c1 Status: Started Number of Bricks: 16 Transport-type: tcp Hot Tier : Hot Tier Type : Distributed-Replicate Number of Bricks: 2 x 2 = 4 Brick1: 10.70.37.174:/bricks/brick3/b3 Brick2: 10.70.37.127:/bricks/brick3/b3 Brick3: 10.70.37.158:/bricks/brick3/b3 Brick4: 10.70.37.180:/bricks/brick3/b3 Cold Tier: Cold Tier Type : Distributed-Disperse Number of Bricks: 2 x (4 + 2) = 12 Brick5: 10.70.37.180:/bricks/brick0/b0 Brick6: 10.70.37.158:/bricks/brick0/b0 Brick7: 10.70.37.127:/bricks/brick0/b0 Brick8: 10.70.37.174:/bricks/brick0/b0 Brick9: 10.70.37.180:/bricks/brick1/b1 Brick10: 10.70.37.158:/bricks/brick1/b1 Brick11: 10.70.37.127:/bricks/brick1/b1 Brick12: 10.70.37.174:/bricks/brick1/b1 Brick13: 10.70.37.180:/bricks/brick2/b2 Brick14: 10.70.37.158:/bricks/brick2/b2 Brick15: 10.70.37.127:/bricks/brick2/b2 Brick16: 10.70.37.174:/bricks/brick2/b2 Options Reconfigured: cluster.watermark-hi: 40 cluster.watermark-low: 10 cluster.tier-mode: cache features.ctr-enabled: on ganesha.enable: on features.cache-invalidation: on features.quota-deem-statfs: on features.inode-quota: on features.quota: on nfs.disable: on performance.readdir-ahead: on cluster.enable-shared-storage: enable nfs-ganesha: enable 3.Enable ganesha on the volume and mount on 2 clients using vers=4. 4.Performed file creations (100kb files in large number) from these mount points simultaneously. During file creation observed, observed below issue: https://bugzilla.redhat.com/show_bug.cgi?id=1327773 5.Restarted ganesha service on the mounted node after above issue was hit. 6.Started removing of files from the mount ponits and while doing ls simultaneously from other mount point observed below issue (continuous cache_invalidation messages in ganesha-gfapi.log) https://bugzilla.redhat.com/show_bug.cgi?id=1323424 7) After some time observed " ganesha_grace invoked oom-killer' on one node of the cluster with below trace in dmesg [313396.095545] ganesha_grace invoked oom-killer: gfp_mask=0x3000d0, order=2, oom_score_adj=0 [313396.095551] ganesha_grace cpuset=/ mems_allowed=0 [313396.095554] CPU: 0 PID: 5124 Comm: ganesha_grace Not tainted 3.10.0-327.el7.x86_64 #1 [313396.095556] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 [313396.095558] ffff8800ca245080 00000000a3331627 ffff8800516d3b68 ffffffff816351f1 [313396.095562] ffff8800516d3bf8 ffffffff81630191 ffff8800be262830 ffff8800be262848 [313396.095564] ffffffff00000202 fffeefff00000000 0000000000000002 ffffffff81128803 [313396.095567] Call Trace: [313396.095575] [<ffffffff816351f1>] dump_stack+0x19/0x1b [313396.095578] [<ffffffff81630191>] dump_header+0x8e/0x214 [313396.095582] [<ffffffff81128803>] ? delayacct_end+0x63/0xb0 [313396.095586] [<ffffffff8116cdee>] oom_kill_process+0x24e/0x3b0 [313396.095590] [<ffffffff81088dae>] ? has_capability_noaudit+0x1e/0x30 [313396.095593] [<ffffffff8116d616>] out_of_memory+0x4b6/0x4f0 [313396.095597] [<ffffffff811737f5>] __alloc_pages_nodemask+0xa95/0xb90 [313396.095601] [<ffffffff81078d73>] copy_process.part.25+0x163/0x1610 [313396.095604] [<ffffffff81285ea6>] ? security_file_alloc+0x16/0x20 [313396.095608] [<ffffffff811e07de>] ? alloc_file+0x1e/0xf0 [313396.095611] [<ffffffff8107a401>] do_fork+0xe1/0x320 [313396.095613] [<ffffffff81090731>] ? __set_task_blocked+0x41/0xa0 [313396.095616] [<ffffffff8107a6c6>] SyS_clone+0x16/0x20 [313396.095620] [<ffffffff81645c59>] stub_clone+0x69/0x90 [313396.095623] [<ffffffff81645909>] ? system_call_fastpath+0x16/0x1b lrmd invoked oom-killer on other 2 nodes with below trace: [127609.615506] lrmd invoked oom-killer: gfp_mask=0x3000d0, order=2, oom_score_adj=0 [127609.615513] lrmd cpuset=/ mems_allowed=0 [127609.615516] CPU: 3 PID: 12818 Comm: lrmd Not tainted 3.10.0-327.el7.x86_64 #1 [127609.615518] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 [127609.615520] ffff8800d3f03980 00000000db61a341 ffff880210e7fb68 ffffffff816351f1 [127609.615524] ffff880210e7fbf8 ffffffff81630191 ffff8800363ba7c0 ffff8800363ba7d8 [127609.615526] ffffffff00000202 fff6efff00000000 0000000000000001 ffffffff81128803 [127609.615529] Call Trace: [127609.615539] [<ffffffff816351f1>] dump_stack+0x19/0x1b [127609.615543] [<ffffffff81630191>] dump_header+0x8e/0x214 [127609.615548] [<ffffffff81128803>] ? delayacct_end+0x63/0xb0 [127609.615553] [<ffffffff8116cdee>] oom_kill_process+0x24e/0x3b0 [127609.615556] [<ffffffff8116d616>] out_of_memory+0x4b6/0x4f0 [127609.615559] [<ffffffff811737f5>] __alloc_pages_nodemask+0xa95/0xb90 [127609.615565] [<ffffffff81078d73>] copy_process.part.25+0x163/0x1610 [127609.615568] [<ffffffff8107a401>] do_fork+0xe1/0x320 [127609.615574] [<ffffffff811e3b5e>] ? SYSC_newstat+0x3e/0x60 [127609.615576] [<ffffffff8107a6c6>] SyS_clone+0x16/0x20 [127609.615581] [<ffffffff81645c59>] stub_clone+0x69/0x90 [127609.615584] [<ffffffff81645909>] ? system_call_fastpath+0x16/0x1b Actual results: ganesha_grace invoked oom-killer observed while doing IO on tiered volume. Expected results: No OOM kill should be observed. Additional info:
Also pcs status shows below failed actions: Failed actions: nfs-grace_monitor_5000 on dhcp37-180.lab.eng.blr.redhat.com 'unknown error' (1): call=98, status=Timed Out, exit-reason='none', last-rc-change='Fri Apr 15 18:47:44 2016', queued=0ms, exec=0ms nfs-mon_monitor_10000 on dhcp37-127.lab.eng.blr.redhat.com 'unknown error' (1): call=40, status=Timed Out, exit-reason='none', last-rc-change='Fri Apr 15 16:35:07 2016', queued=0ms, exec=0ms nfs-grace_monitor_5000 on dhcp37-127.lab.eng.blr.redhat.com 'unknown error' (1): call=43, status=Timed Out, exit-reason='none', last-rc-change='Fri Apr 15 16:35:07 2016', queued=0ms, exec=0ms nfs-grace_monitor_5000 on dhcp37-158.lab.eng.blr.redhat.com 'unknown error' (1): call=34, status=Timed Out, exit-reason='none', last-rc-change='Fri Apr 15 16:35:11 2016', queued=0ms, exec=0ms nfs-mon_monitor_10000 on dhcp37-158.lab.eng.blr.redhat.com 'unknown error' (1): call=33, status=Timed Out, exit-reason='none', last-rc-change='Fri Apr 15 16:35:11 2016', queued=0ms, exec=0ms
sosreports and logs are placed at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1327831
Shashank, oom killer gets invoked when system is experiencing memory crunch. Do you see nfs-ganesha (or any other related) process being killed because of oom_kill?
below messages are observed in dmesg: on node1: [313396.095816] Out of memory: Kill process 7431 (glusterfs) score 711 or sacrifice child [313396.095880] Killed process 7431 (glusterfs) total-vm:132192500kB, anon-rss:5576680kB, file-rss:0k on node2: [127609.615788] Out of memory: Kill process 10162 (glusterfs) score 798 or sacrifice child [127609.615857] Killed process 10162 (glusterfs) total-vm:151881424kB, anon-rss:6311664kB, file-rss:0kB on node3: [127218.901862] Out of memory: Kill process 11349 (glusterfs) score 754 or sacrifice child [127218.901928] Killed process 11349 (glusterfs) total-vm:141319480kB, anon-rss:6073472kB, file-rss:780kB Volume status shows self heal processes not running on 3 nodes: Status of volume: tiervolume Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Hot Bricks: Brick 10.70.37.174:/bricks/brick3/b3 49159 0 Y 12378 Brick 10.70.37.127:/bricks/brick3/b3 49160 0 Y 11328 Brick 10.70.37.158:/bricks/brick3/b3 49165 0 Y 10141 Brick 10.70.37.180:/bricks/brick3/b3 49160 0 Y 7410 Cold Bricks: Brick 10.70.37.180:/bricks/brick0/b0 49157 0 Y 32233 Brick 10.70.37.158:/bricks/brick0/b0 49162 0 Y 2675 Brick 10.70.37.127:/bricks/brick0/b0 49157 0 Y 3909 Brick 10.70.37.174:/bricks/brick0/b0 49156 0 Y 4960 Brick 10.70.37.180:/bricks/brick1/b1 49158 0 Y 32252 Brick 10.70.37.158:/bricks/brick1/b1 49163 0 Y 2694 Brick 10.70.37.127:/bricks/brick1/b1 49158 0 Y 3928 Brick 10.70.37.174:/bricks/brick1/b1 49157 0 Y 4979 Brick 10.70.37.180:/bricks/brick2/b2 49159 0 Y 32271 Brick 10.70.37.158:/bricks/brick2/b2 49164 0 Y 2713 Brick 10.70.37.127:/bricks/brick2/b2 49159 0 Y 3947 Brick 10.70.37.174:/bricks/brick2/b2 49158 0 Y 4998 Self-heal Daemon on localhost N/A N/A N N/A Quota Daemon on localhost N/A N/A Y 7439 Self-heal Daemon on dhcp37-158.lab.eng.blr. redhat.com N/A N/A N N/A Quota Daemon on dhcp37-158.lab.eng.blr.redh at.com N/A N/A Y 10179 Self-heal Daemon on dhcp37-174.lab.eng.blr. redhat.com N/A N/A Y 12399 Quota Daemon on dhcp37-174.lab.eng.blr.redh at.com N/A N/A Y 12413 Self-heal Daemon on dhcp37-127.lab.eng.blr. redhat.com N/A N/A N N/A Quota Daemon on dhcp37-127.lab.eng.blr.redh at.com N/A N/A Y 11363
Thanks Shashank. Moving it to AFR component to investigate high-memory usage by self-heal daemon.
This issue has been frequently seen while removing files from ganesha mount on tiered volume. self heal daemon gets killed on all the nodes resulting in hang of the ganesha mount.
While trying to verify the bug with build provided, below are the observations: rpm versions: [root@dhcp37-180 /]# rpm -qa|grep glusterfs glusterfs-3.7.9-1.el7rhgs.testing.bz1327831.x86_64 glusterfs-fuse-3.7.9-1.el7rhgs.testing.bz1327831.x86_64 glusterfs-api-devel-3.7.9-1.el7rhgs.testing.bz1327831.x86_64 glusterfs-rdma-3.7.9-1.el7rhgs.testing.bz1327831.x86_64 glusterfs-libs-3.7.9-1.el7rhgs.testing.bz1327831.x86_64 glusterfs-client-xlators-3.7.9-1.el7rhgs.testing.bz1327831.x86_64 glusterfs-cli-3.7.9-1.el7rhgs.testing.bz1327831.x86_64 glusterfs-devel-3.7.9-1.el7rhgs.testing.bz1327831.x86_64 glusterfs-ganesha-3.7.9-1.el7rhgs.testing.bz1327831.x86_64 glusterfs-resource-agents-3.7.9-1.el7rhgs.testing.bz1327831.noarch glusterfs-debuginfo-3.7.9-1.el7rhgs.testing.bz1327831.x86_64 glusterfs-api-3.7.9-1.el7rhgs.testing.bz1327831.x86_64 glusterfs-server-3.7.9-1.el7rhgs.testing.bz1327831.x86_64 glusterfs-geo-replication-3.7.9-1.el7rhgs.testing.bz1327831.x86_64 [root@dhcp37-180 /]# rpm -qa|grep ganesha nfs-ganesha-2.3.1-4.el7rhgs.x86_64 nfs-ganesha-gluster-2.3.1-4.el7rhgs.x86_64 glusterfs-ganesha-3.7.9-1.el7rhgs.testing.bz1327831.x86_64 While creating 100 KB files from 2 mount points (120000 from each), 2 of 4 nodes had below CPU and memory utilization for SHD [root@dhcp37-158 exports]# ps aux|grep 25330 root 25330 25.6 47.5 108718472 3808860 ? Ssl 01:57 68:13 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/1b764ddb77c8e070781bbf988d8cd97c.socket --xlator-option *replicate*.node-uuid=18fa3cca-c714-4c70-b227-cef260fffa27 [root@dhcp37-158 exports]# cat /proc/25330/oom_score 569 [root@dhcp37-174 exports]# ps aux|grep 26891 root 26891 28.2 61.1 127531980 4901076 ? Ssl 01:57 78:45 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/209f5cf6a29e255887e0f676be136874.socket --xlator-option *replicate*.node-uuid=1a5a806a-ab58-462b-b939-0b8158a2d914 [root@dhcp37-174 exports]# cat /proc/26891/oom_score 668 At this point, IO hanged and after sometime OOM kill issue is seen on dhcp37-174 node in dmesg on node: [1039955.322429] glusterfs invoked oom-killer: gfp_mask=0x42d0, order=3, oom_score_adj=0 [1039955.322436] glusterfs cpuset=/ mems_allowed=0 [1039955.322439] CPU: 1 PID: 26980 Comm: glusterfs Not tainted 3.10.0-327.el7.x86_64 #1 [1039955.322441] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 [1039955.322443] ffff8801d9541700 00000000868b512e ffff88018b0df840 ffffffff816351f1 [1039955.322447] ffff88018b0df8d0 ffffffff81630191 ffff880212160520 ffff880212160538 [1039955.322450] ffffffff00000202 fffeefff00000000 0000000000000004 ffffffff81128803 [1039955.322453] Call Trace: [1039955.322463] [<ffffffff816351f1>] dump_stack+0x19/0x1b [1039955.322467] [<ffffffff81630191>] dump_header+0x8e/0x214 [1039955.322472] [<ffffffff81128803>] ? delayacct_end+0x63/0xb0 [1039955.322477] [<ffffffff8116cdee>] oom_kill_process+0x24e/0x3b0 [1039955.322482] [<ffffffff81088dae>] ? has_capability_noaudit+0x1e/0x30 [1039955.322485] [<ffffffff8116d616>] out_of_memory+0x4b6/0x4f0 [1039955.322489] [<ffffffff811737f5>] __alloc_pages_nodemask+0xa95/0xb90 [1039955.322494] [<ffffffff811b43f9>] alloc_pages_current+0xa9/0x170 [1039955.322500] [<ffffffff81514ad0>] sk_page_frag_refill+0x70/0x160 [1039955.322505] [<ffffffff81576b73>] tcp_sendmsg+0x263/0xc20 [1039955.322511] [<ffffffff815a0f44>] inet_sendmsg+0x64/0xb0 [1039955.322516] [<ffffffff812889d3>] ? selinux_socket_sendmsg+0x23/0x30 [1039955.322519] [<ffffffff8150fe47>] sock_aio_write+0x157/0x180 [1039955.322521] [<ffffffff8116b5e8>] ? wait_on_page_bit_killable+0x88/0xb0 [1039955.322525] [<ffffffff811dde69>] do_sync_readv_writev+0x79/0xd0 [1039955.322528] [<ffffffff811df43e>] do_readv_writev+0xce/0x260 [1039955.322533] [<ffffffff81197088>] ? handle_mm_fault+0x5b8/0xf50 [1039955.322539] [<ffffffff81058aaf>] ? kvm_clock_get_cycles+0x1f/0x30 [1039955.322544] [<ffffffff810d87ca>] ? __getnstimeofday64+0x3a/0xd0 [1039955.322546] [<ffffffff811df665>] vfs_writev+0x35/0x60 [1039955.322548] [<ffffffff811df81f>] SyS_writev+0x7f/0x110 [1039955.322554] [<ffffffff81645909>] system_call_fastpath+0x16/0x1b . . . . [1039955.322797] Out of memory: Kill process 26891 (glusterfs) score 668 or sacrifice child [1039955.323035] Killed process 26891 (glusterfs) total-vm:127531980kB, anon-rss:4900364kB, file-rss:376kB the other node dhcp37-158 is hanged and not able to login to that node. (Will update the bug once it comes back) No such messages "client_cbk_cache_invalidation" seen in shd log. Setup details are as under in case anyone wants to have a look: [root@dhcp37-180 ~]# gluster peer status Number of Peers: 3 Hostname: dhcp37-158.lab.eng.blr.redhat.com Uuid: 18fa3cca-c714-4c70-b227-cef260fffa27 State: Peer in Cluster (Connected) Hostname: dhcp37-127.lab.eng.blr.redhat.com Uuid: 43649367-7f47-41cf-8d63-97896e3504d4 State: Peer in Cluster (Connected) Hostname: dhcp37-174.lab.eng.blr.redhat.com Uuid: 1a5a806a-ab58-462b-b939-0b8158a2d914 State: Peer in Cluster (Connected)
Observed that some of the nodes of the cluster were not accessible at all because of high CPU usage, so had to do a reboot of all nodes. sosreports from the nodes are placed at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1327831/latest
We are hitting this issue consistently on tiered volume setup and it makes whole cluster unusable because of the high CPU and memory usage. While running fssanity on v3 ganesha mount on a tiered volume, after some of the test suites got executed, observed OOM kill of self heal daemon on all the nodes, the mounted node is not accessible and all the bricks residing on that mounted node went down [root@dhcp37-127 ~]# gluster vol info tiervolume Volume Name: tiervolume Type: Tier Volume ID: 45fd73f7-e8ed-43da-b9c6-79ae042cef12 Status: Started Number of Bricks: 16 Transport-type: tcp Hot Tier : Hot Tier Type : Distributed-Replicate Number of Bricks: 2 x 2 = 4 Brick1: 10.70.37.174:/bricks/brick3/b3 Brick2: 10.70.37.127:/bricks/brick3/b3 Brick3: 10.70.37.158:/bricks/brick3/b3 Brick4: 10.70.37.180:/bricks/brick3/b3 Cold Tier: Cold Tier Type : Distributed-Disperse Number of Bricks: 2 x (4 + 2) = 12 Brick5: 10.70.37.180:/bricks/brick0/b0 Brick6: 10.70.37.158:/bricks/brick0/b0 Brick7: 10.70.37.127:/bricks/brick0/b0 Brick8: 10.70.37.174:/bricks/brick0/b0 Brick9: 10.70.37.180:/bricks/brick1/b1 Brick10: 10.70.37.158:/bricks/brick1/b1 Brick11: 10.70.37.127:/bricks/brick1/b1 Brick12: 10.70.37.174:/bricks/brick1/b1 Brick13: 10.70.37.180:/bricks/brick2/b2 Brick14: 10.70.37.158:/bricks/brick2/b2 Brick15: 10.70.37.127:/bricks/brick2/b2 Brick16: 10.70.37.174:/bricks/brick2/b2 Options Reconfigured: ganesha.enable: on features.cache-invalidation: on features.quota-deem-statfs: on features.inode-quota: on features.quota: on cluster.tier-mode: cache features.ctr-enabled: on nfs.disable: on performance.readdir-ahead: on cluster.enable-shared-storage: enable nfs-ganesha: enable [root@dhcp37-127 ~]# gluster vol status tiervolume Status of volume: tiervolume Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Hot Bricks: Brick 10.70.37.174:/bricks/brick3/b3 49206 0 Y 5472 Brick 10.70.37.127:/bricks/brick3/b3 49211 0 Y 3970 Brick 10.70.37.158:/bricks/brick3/b3 49212 0 Y 3708 Cold Bricks: Brick 10.70.37.158:/bricks/brick0/b0 49209 0 Y 3525 Brick 10.70.37.127:/bricks/brick0/b0 49208 0 Y 3780 Brick 10.70.37.174:/bricks/brick0/b0 49203 0 Y 5291 Brick 10.70.37.158:/bricks/brick1/b1 49210 0 Y 3547 Brick 10.70.37.127:/bricks/brick1/b1 49209 0 Y 3799 Brick 10.70.37.174:/bricks/brick1/b1 49204 0 Y 5310 Brick 10.70.37.158:/bricks/brick2/b2 49211 0 Y 3566 Brick 10.70.37.127:/bricks/brick2/b2 49210 0 Y 3818 Brick 10.70.37.174:/bricks/brick2/b2 49205 0 Y 5329 Self-heal Daemon on localhost N/A N/A N N/A Quota Daemon on localhost N/A N/A Y 4323 Self-heal Daemon on dhcp37-174.lab.eng.blr. redhat.com N/A N/A N N/A Quota Daemon on dhcp37-174.lab.eng.blr.redh at.com N/A N/A Y 5828 Self-heal Daemon on dhcp37-158.lab.eng.blr. redhat.com N/A N/A N N/A Quota Daemon on dhcp37-158.lab.eng.blr.redh at.com N/A N/A Y 4055 Task Status of Volume tiervolume ------------------------------------------------------------------------------ Task : Tier migration ID : bf23ff00-4a5f-4b30-a2f7-e942847d63a5 Status : in progress
OK. The same daemon (Linux process) handles self-healing in both EC and AFR sub-volumes. We need to find out whether the leak is coming from AFR or EC. Could you help us in isolating the translator that is leaking memory? There are two ways you can do this: With the same distributed-replicate + distributed-disperse tiered volume, when you run your IO and run self-heal and find that the memory consumed by the daemon is progressively rising, could you take the statedump of the SHD and attach it here? Here's what you need to do to capture the statedump: $ kill -USR1 <pid-of-self-heal-daemon> It would be helpful if you capture the statedump at several different points of time on shd; that would prove that the memory consumed by the process is indeed progressively increasing and help us in isolating the data structure whose memory is being leaked. OR Run the same test twice - once with no disperse in the tiered volume, and another time with no AFR in the tiered volume. One of them (or both if they both have leaks!) will be OOM-killed eventually. -Krutika
Just to confirm, prefer running the steps once again with 3.1.3 final build and then based on the results we can close it. Will update accordingly.
(In reply to Shashank Raj from comment #23) > Just to confirm, prefer running the steps once again with 3.1.3 final build > and then based on the results we can close it. > > Will update accordingly. Thanks. When is this planned?
(In reply to Nithya Balachandran from comment #24) > (In reply to Shashank Raj from comment #23) > > Just to confirm, prefer running the steps once again with 3.1.3 final build > > and then based on the results we can close it. > > > > Will update accordingly. > > Thanks. When is this planned? Will try to finish it by EOW
Tried reproducing the issue with latest 3.1.3 build on tiered volume. During IO's the memory consumption for shd remains almost constant now, which earlier was causing the oom kills. So this issue can be closed as it works fine with the latest builds.