The purpose of raising this bug is to track seperately two actions performed on the same set of files While doing rm -rf of files which were also being renamed from a different client , saw oom kills of fuse mount As discussed in triage meeting , for the bz https://bugzilla.redhat.com/show_bug.cgi?id=1381140 , we are raising a seperate bug For more info refer to bz https://bugzilla.redhat.com/show_bug.cgi?id=1381140 Also refer to bug https://bugzilla.redhat.com/show_bug.cgi?id=1400067 which was raised to track with 3 different bugs
We have noticed that the bug is not reproduced in the latest version of the product (RHGS-3.3.1+). If the bug is still relevant and is being reproduced, feel free to reopen the bug.
Noticed that there was no 'readdir-ahead' enabled on the volume, so my suspicion is not valid. @Nag, if you are restarting the client and starting the same type of job, would like to get a statedump with now/ 1hr later/ 24hr later, timeframe. Just by looking at the logs, not much is evident. I see that there are too many errors which are returned to the application (like EPERM, ENOENT etc). So, one possibility is, there is a leaky path during negative scenarios, which doesn't get executed in happy path. Need similar tests to check for exact leak.
Hit the problem again, in less than 7 hrs IOs being run from client are below: 1)while true;do find *|xargs stat;done--->from root of volume mount 2) removing some directories (which had untarred linux ) /mnt/rpcx3-new/IOs/kernel/dhcp35-77.lab.eng.blr.redhat.com [root@dhcp35-77 dhcp35-77.lab.eng.blr.redhat.com]# time rm -rf dir.3* rm: cannot remove ‘dir.3/linux-4.20.8/drivers/staging/vt6656/main_usb.c’: Transport endpoint is not connected rm: cannot remove ‘dir.3/linux-4.20.8/drivers/staging/vt6656/rf.c’: Transport endpoint is not connected rm: cannot remove ‘dir.3/linux-4.20.8/drivers/staging/vt6656/usbpipe.h’: Transport endpoint is not connected rm: cannot remove ‘dir.3/linux-4.20.8/drivers/staging/vt6656/Kconfig’: Transport endpoint is not connected rm: cannot remove ‘dir.3/linux-4.20.8/drivers/staging/vt6656/TODO’: Transport endpoint is not connected rm: cannot remove ‘dir.3/linux-4.20.8/drivers/staging/vt6656/baseband.c’: Transport endpoint is not connected rm: cannot remove ‘dir.3/linux-4.20.8/drivers/staging/vt6656/card.c’: Transport endpoint is not connected rm: cannot remove ‘dir.3/linux-4.20.8/drivers/staging/vt6656/card.h’: Transport endpoint is not connected rm: cannot remove ‘dir.3/linux-4.20.8/drivers/staging/vt6656/channel.c’: Transport endpoint is not connected rm: cannot remove ‘dir.3/linux-4.20.8/drivers/staging/vt6656/device.h’: Transport endpoint is not connected rm: cannot remove ‘dir.3/linux-4.20.8/drivers/staging/vt6656/dpc.c’: Transport endpoint is not connected rm: cannot remove ‘dir.3/linux-4.20.8/drivers/staging/vt6656/firmware.h’: Transport endpoint is not connected rm: cannot remove ‘dir.3/linux-4.20.8/drivers/staging/vt6656/key.c’: Transport endpoint is not connected rm: cannot remove ‘dir.3/linux-4.20.8/drivers/staging/vt6656/mac.h’: Transport endpoint is not connected rm: cannot remove ‘dir.3/linux-4.20.8/drivers/staging/vt6656/power.h’: Transport endpoint is not connected rm: cannot remove ‘dir.3/linux-4.20.8/drivers/staging/vt6656/rf.h’: Transport endpoint is not connected rm: cannot remove ‘dir.3/linux-4.20.8/drivers/staging/vt6656/rxtx.c’: Transport endpoint is not connected rm: cannot remove ‘dir.3/linux-4.20.8/drivers/staging/vt6656/rxtx.h’: Transport endpoint is not connected rm: cannot remove ‘dir.3/linux-4.20.8/drivers/staging/vt6656/wcmd.c’: Transport endpoint is not connected rm: cannot remove ‘dir.3/linux-4.20.8/drivers/staging/vt6656/wcmd.h’: Transport endpoint is not connected rm: cannot remove ‘dir.3/linux-4.20.8/drivers/staging/wlan-ng’: Transport endpoint is not connected rm: fts_read failed: Transport endpoint is not connected real 384m9.243s user 0m0.622s sys 0m7.248s 3)untar of linux kernel on same parent directory as 2) but without conflicting 2) 4)capturing resource o/p to a file in append mode every 2 minutes on mount point 5) taking statedumps every 30min sosreports and logs with statedumps of fuse mount proc taken every 30min http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1400071/clients/reproducedissue-with-statedumps-of-fuse-mount/
[Mon Mar 11 17:48:50 2019] glustersigwait invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0 [Mon Mar 11 17:48:50 2019] glustersigwait cpuset=/ mems_allowed=0 [Mon Mar 11 17:48:50 2019] CPU: 0 PID: 32195 Comm: glustersigwait Kdump: loaded Not tainted 3.10.0-957.5.1.el7.x86_64 #1 [Mon Mar 11 17:48:50 2019] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [Mon Mar 11 17:48:50 2019] Call Trace: [Mon Mar 11 17:48:50 2019] [<ffffffff99761e41>] dump_stack+0x19/0x1b [Mon Mar 11 17:48:50 2019] [<ffffffff9975c86a>] dump_header+0x90/0x229 [Mon Mar 11 17:48:50 2019] [<ffffffff99300f3b>] ? cred_has_capability+0x6b/0x120 [Mon Mar 11 17:48:50 2019] [<ffffffff991ba524>] oom_kill_process+0x254/0x3d0 [Mon Mar 11 17:48:50 2019] [<ffffffff9930101e>] ? selinux_capable+0x2e/0x40 [Mon Mar 11 17:48:50 2019] [<ffffffff991bad66>] out_of_memory+0x4b6/0x4f0 [Mon Mar 11 17:48:50 2019] [<ffffffff9975d36e>] __alloc_pages_slowpath+0x5d6/0x724 [Mon Mar 11 17:48:50 2019] [<ffffffff991c1145>] __alloc_pages_nodemask+0x405/0x420 [Mon Mar 11 17:48:50 2019] [<ffffffff99211535>] alloc_pages_vma+0xb5/0x200 [Mon Mar 11 17:48:50 2019] [<ffffffff991ff785>] __read_swap_cache_async+0x115/0x190 [Mon Mar 11 17:48:50 2019] [<ffffffff991ff826>] read_swap_cache_async+0x26/0x60 [Mon Mar 11 17:48:50 2019] [<ffffffff991ff90c>] swapin_readahead+0xac/0x110 [Mon Mar 11 17:48:50 2019] [<ffffffff991e9a02>] handle_pte_fault+0x812/0xd10 [Mon Mar 11 17:48:50 2019] [<ffffffff991ec01d>] handle_mm_fault+0x39d/0x9b0 [Mon Mar 11 17:48:50 2019] [<ffffffff9976f5e3>] __do_page_fault+0x203/0x500 [Mon Mar 11 17:48:50 2019] [<ffffffff9976f9c6>] trace_do_page_fault+0x56/0x150 [Mon Mar 11 17:48:50 2019] [<ffffffff9976ef42>] do_async_page_fault+0x22/0xf0 [Mon Mar 11 17:48:50 2019] [<ffffffff9976b788>] async_page_fault+0x28/0x30 [Mon Mar 11 17:48:50 2019] Mem-Info: [Mon Mar 11 17:48:50 2019] active_anon:665224 inactive_anon:250707 isolated_anon:0 active_file:0 inactive_file:19 isolated_file:0 unevictable:0 dirty:0 writeback:1 unstable:0 slab_reclaimable:5816 slab_unreclaimable:9495 mapped:118 shmem:226 pagetables:4908 bounce:0 free:21607 free_pcp:487 free_cma:0 [Mon Mar 11 17:48:50 2019] Node 0 DMA free:15364kB min:276kB low:344kB high:412kB active_anon:148kB inactive_anon:356kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:4kB slab_unreclaimable:20kB kernel_stack:0kB pagetables:12kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [Mon Mar 11 17:48:50 2019] lowmem_reserve[]: 0 2815 3773 3773 [Mon Mar 11 17:48:50 2019] Node 0 DMA32 free:53972kB min:50200kB low:62748kB high:75300kB active_anon:2210716kB inactive_anon:552440kB active_file:0kB inactive_file:68kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129336kB managed:2883116kB mlocked:0kB dirty:0kB writeback:4kB mapped:336kB shmem:704kB slab_reclaimable:13916kB slab_unreclaimable:23952kB kernel_stack:1488kB pagetables:13244kB unstable:0kB bounce:0kB free_pcp:1580kB local_pcp:56kB free_cma:0kB writeback_tmp:0kB pages_scanned:1361 all_unreclaimable? yes [Mon Mar 11 17:48:50 2019] lowmem_reserve[]: 0 0 958 958 [Mon Mar 11 17:48:50 2019] Node 0 Normal free:17092kB min:17100kB low:21372kB high:25648kB active_anon:450032kB inactive_anon:450032kB active_file:8kB inactive_file:8kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1048576kB managed:981200kB mlocked:0kB dirty:0kB writeback:0kB mapped:136kB shmem:200kB slab_reclaimable:9344kB slab_unreclaimable:14008kB kernel_stack:1296kB pagetables:6376kB unstable:0kB bounce:0kB free_pcp:368kB local_pcp:108kB free_cma:0kB writeback_tmp:0kB pages_scanned:951 all_unreclaimable? yes [Mon Mar 11 17:48:50 2019] lowmem_reserve[]: 0 0 0 0 [Mon Mar 11 17:48:50 2019] Node 0 DMA: 3*4kB (EM) 3*8kB (UE) 4*16kB (UE) 3*32kB (UE) 1*64kB (E) 2*128kB (UE) 2*256kB (UE) 2*512kB (EM) 3*1024kB (UEM) 1*2048kB (E) 2*4096kB (M) = 15364kB [Mon Mar 11 17:48:50 2019] Node 0 DMA32: 695*4kB (UEM) 776*8kB (UE) 741*16kB (UEM) 383*32kB (UEM) 204*64kB (UEM) 58*128kB (UEM) 2*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54092kB [Mon Mar 11 17:48:50 2019] Node 0 Normal: 227*4kB (UEM) 266*8kB (UEM) 293*16kB (UEM) 135*32kB (UEM) 59*64kB (UEM) 10*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 17100kB [Mon Mar 11 17:48:50 2019] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [Mon Mar 11 17:48:50 2019] 8296 total pagecache pages [Mon Mar 11 17:48:50 2019] 8015 pages in swap cache [Mon Mar 11 17:48:50 2019] Swap cache stats: add 495331795, delete 495325510, find 411130330/473729900 [Mon Mar 11 17:48:50 2019] Free swap = 0kB [Mon Mar 11 17:48:50 2019] Total swap = 4063228kB [Mon Mar 11 17:48:50 2019] 1048476 pages RAM [Mon Mar 11 17:48:50 2019] 0 pages HighMem/MovableOnly [Mon Mar 11 17:48:50 2019] 78420 pages reserved [Mon Mar 11 17:48:50 2019] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [Mon Mar 11 17:48:50 2019] [ 1603] 0 1603 9866 98 25 63 0 systemd-journal [Mon Mar 11 17:48:50 2019] [ 1631] 0 1631 50270 0 31 413 0 lvmetad [Mon Mar 11 17:48:50 2019] [ 1636] 0 1636 11920 2 27 574 -1000 systemd-udevd [Mon Mar 11 17:48:50 2019] [ 2973] 0 2973 13880 8 26 103 -1000 auditd [Mon Mar 11 17:48:50 2019] [ 2997] 0 2997 6594 18 18 58 0 systemd-logind [Mon Mar 11 17:48:50 2019] [ 2998] 999 2998 153254 0 59 2306 0 polkitd [Mon Mar 11 17:48:50 2019] [ 3002] 0 3002 5383 20 14 41 0 irqbalance [Mon Mar 11 17:48:50 2019] [ 3005] 81 3005 16686 72 34 175 -900 dbus-daemon [Mon Mar 11 17:48:50 2019] [ 3008] 998 3008 29446 17 29 97 0 chronyd [Mon Mar 11 17:48:50 2019] [ 3054] 0 3054 31572 16 21 141 0 crond [Mon Mar 11 17:48:50 2019] [ 3058] 0 3058 27523 1 10 31 0 agetty [Mon Mar 11 17:48:50 2019] [ 3062] 0 3062 90507 0 98 6499 0 firewalld [Mon Mar 11 17:48:50 2019] [ 3063] 0 3063 156358 233 89 332 0 NetworkManager [Mon Mar 11 17:48:50 2019] [ 3366] 0 3366 26865 44 54 455 0 dhclient [Mon Mar 11 17:48:50 2019] [ 3575] 0 3575 28215 0 57 270 -1000 sshd [Mon Mar 11 17:48:50 2019] [ 3578] 0 3578 143546 130 98 3259 0 tuned [Mon Mar 11 17:48:50 2019] [ 3579] 0 3579 54102 592 43 1131 0 rsyslogd [Mon Mar 11 17:48:50 2019] [ 3601] 0 3601 26992 2 10 37 0 rhnsd [Mon Mar 11 17:48:50 2019] [ 3807] 0 3807 22411 20 44 239 0 master [Mon Mar 11 17:48:50 2019] [ 3812] 89 3812 22454 17 48 244 0 qmgr [Mon Mar 11 17:48:50 2019] [ 4284] 0 4284 32008 26 17 169 0 screen [Mon Mar 11 17:48:50 2019] [ 4285] 0 4285 33513 186 23 4582 0 bash [Mon Mar 11 17:48:50 2019] [ 6544] 0 6544 79626 0 121 3537 -900 rhsmd [Mon Mar 11 17:48:50 2019] [21953] 0 21953 39331 0 79 504 0 sshd [Mon Mar 11 17:48:50 2019] [22543] 0 22543 28912 2 13 146 0 bash [Mon Mar 11 17:48:50 2019] [24745] 0 24745 31976 1 18 145 0 screen [Mon Mar 11 17:48:50 2019] [24746] 0 24746 28893 2 13 139 0 bash [Mon Mar 11 17:48:50 2019] [25498] 0 25498 31975 1 18 148 0 screen [Mon Mar 11 17:48:50 2019] [25499] 0 25499 28893 2 14 131 0 bash [Mon Mar 11 17:48:50 2019] [ 1624] 0 1624 31975 91 18 74 0 screen [Mon Mar 11 17:48:50 2019] [ 1625] 0 1625 29021 168 13 80 0 bash [Mon Mar 11 17:48:50 2019] [11285] 0 11285 45590 3 46 232 0 crond [Mon Mar 11 17:48:50 2019] [17449] 0 17449 45590 3 46 232 0 crond [Mon Mar 11 17:48:50 2019] [20152] 0 20152 45590 3 46 232 0 crond [Mon Mar 11 17:48:50 2019] [25375] 0 25375 45590 3 46 232 0 crond [Mon Mar 11 17:48:50 2019] [27958] 0 27958 45590 3 46 232 0 crond [Mon Mar 11 17:48:50 2019] [10551] 0 10551 45590 3 46 232 0 crond [Mon Mar 11 17:48:50 2019] [32193] 0 32193 1956326 905728 3303 740779 0 glusterfs [Mon Mar 11 17:48:50 2019] [32712] 0 32712 32008 2 17 185 0 screen [Mon Mar 11 17:48:50 2019] [32713] 0 32713 28893 45 13 88 0 bash [Mon Mar 11 17:48:50 2019] [ 9021] 89 9021 22437 14 44 237 0 pickup [Mon Mar 11 17:48:50 2019] [ 3081] 0 3081 4120 31 13 0 0 find [Mon Mar 11 17:48:50 2019] [ 3082] 0 3082 27063 24 9 0 0 xargs [Mon Mar 11 17:48:50 2019] [ 3104] 0 3104 26988 18 10 0 0 sleep [Mon Mar 11 17:48:50 2019] [ 3105] 0 3105 26988 19 8 0 0 sleep [Mon Mar 11 17:48:50 2019] Out of memory: Kill process 32193 (glusterfs) score 805 or sacrifice child [Mon Mar 11 17:48:50 2019] Killed process 32193 (glusterfs) total-vm:7825304kB, anon-rss:3622912kB, file-rss:0kB, shmem-rss:0kB [root@dhcp35-109 glusterfs]# ls
Few questions: Looks like the process got killed when the overall memory usage was 900MB ?? > [Mon Mar 11 17:48:50 2019] [32193] 0 32193 1956326 905728 3303 740779 0 glusterfs Please note, if you want to run with that much less memory, recommended client configuration is '-olru-limit=10000'. As I checked the inode table details in statedump, and the new feature is already looks like it is working fine. (limit is 128k, and number of inodes in lru-list is always lesser). Next observation, the graph which was not active, held significant memory in its mem-pools. Some of the top contributors are: quick-read- ~150MB io-cache - ~30MB io-stats - ~30MB While the inode_ctx of replicate/client-protocol/dht were also above 10s of MBs. So, I am not able to see any issues per say with the process, but for the given workload, the memory is not enough, or the above option should be set.
Please include the gluster volume info output whenever filing a BZ.
My bad, requested details as below and also available @ https://docs.google.com/spreadsheets/d/17Yf9ZRWnWOpbRyFQ2ZYxAAlp9I_yarzKZdjN8idBJM0/edit#gid=1472913705 [root@rhs-client19 ~]# gluster v info Volume Name: rpcx3 Type: Distributed-Replicate Volume ID: f7532c65-63d0-4e4a-a5b5-c95238635eff Status: Started Snapshot Count: 0 Number of Bricks: 5 x 3 = 15 Transport-type: tcp Bricks: Brick1: rhs-client19.lab.eng.blr.redhat.com:/gluster/brick1/rpcx3 Brick2: rhs-client25.lab.eng.blr.redhat.com:/gluster/brick1/rpcx3 Brick3: rhs-client32.lab.eng.blr.redhat.com:/gluster/brick1/rpcx3 Brick4: rhs-client25.lab.eng.blr.redhat.com:/gluster/brick2/rpcx3 Brick5: rhs-client32.lab.eng.blr.redhat.com:/gluster/brick2/rpcx3 Brick6: rhs-client38.lab.eng.blr.redhat.com:/gluster/brick2/rpcx3 Brick7: rhs-client32.lab.eng.blr.redhat.com:/gluster/brick3/rpcx3 Brick8: rhs-client38.lab.eng.blr.redhat.com:/gluster/brick3/rpcx3 Brick9: rhs-client19.lab.eng.blr.redhat.com:/gluster/brick3/rpcx3 Brick10: rhs-client38.lab.eng.blr.redhat.com:/gluster/brick3/rpcx3-newb Brick11: rhs-client19.lab.eng.blr.redhat.com:/gluster/brick2/rpcx3-newb Brick12: rhs-client25.lab.eng.blr.redhat.com:/gluster/brick3/rpcx3-newb Brick13: rhs-client19.lab.eng.blr.redhat.com:/gluster/brick1/rpcx3-newb Brick14: rhs-client25.lab.eng.blr.redhat.com:/gluster/brick1/rpcx3-newb Brick15: rhs-client32.lab.eng.blr.redhat.com:/gluster/brick1/rpcx3-newb Options Reconfigured: cluster.rebal-throttle: aggressive diagnostics.client-log-level: INFO performance.client-io-threads: off nfs.disable: on transport.address-family: inet diagnostics.latency-measurement: on diagnostics.count-fop-hits: on features.uss: enable features.quota: on features.inode-quota: on features.quota-deem-statfs: on [root@rhs-client19 ~]# gluster v status Status of volume: rpcx3 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick rhs-client19.lab.eng.blr.redhat.com:/ gluster/brick1/rpcx3 49152 0 Y 10824 Brick rhs-client25.lab.eng.blr.redhat.com:/ gluster/brick1/rpcx3 49152 0 Y 5232 Brick rhs-client32.lab.eng.blr.redhat.com:/ gluster/brick1/rpcx3 49152 0 Y 10898 Brick rhs-client25.lab.eng.blr.redhat.com:/ gluster/brick2/rpcx3 49153 0 Y 5253 Brick rhs-client32.lab.eng.blr.redhat.com:/ gluster/brick2/rpcx3 49153 0 Y 10904 Brick rhs-client38.lab.eng.blr.redhat.com:/ gluster/brick2/rpcx3 49152 0 Y 31256 Brick rhs-client32.lab.eng.blr.redhat.com:/ gluster/brick3/rpcx3 49154 0 Y 10998 Brick rhs-client38.lab.eng.blr.redhat.com:/ gluster/brick3/rpcx3 49153 0 Y 31277 Brick rhs-client19.lab.eng.blr.redhat.com:/ gluster/brick3/rpcx3 49153 0 Y 10826 Brick rhs-client38.lab.eng.blr.redhat.com:/ gluster/brick3/rpcx3-newb 49155 0 Y 19062 Brick rhs-client19.lab.eng.blr.redhat.com:/ gluster/brick2/rpcx3-newb 49155 0 Y 29805 Brick rhs-client25.lab.eng.blr.redhat.com:/ gluster/brick3/rpcx3-newb 49155 0 Y 30021 Brick rhs-client19.lab.eng.blr.redhat.com:/ gluster/brick1/rpcx3-newb 49156 0 Y 29826 Brick rhs-client25.lab.eng.blr.redhat.com:/ gluster/brick1/rpcx3-newb 49156 0 Y 30042 Brick rhs-client32.lab.eng.blr.redhat.com:/ gluster/brick1/rpcx3-newb 49156 0 Y 1636 Snapshot Daemon on localhost 49154 0 Y 10872 Self-heal Daemon on localhost N/A N/A Y 29849 Quota Daemon on localhost N/A N/A Y 29860 Snapshot Daemon on rhs-client32.lab.eng.blr .redhat.com 49155 0 Y 11221 Self-heal Daemon on rhs-client32.lab.eng.bl r.redhat.com N/A N/A Y 1658 Quota Daemon on rhs-client32.lab.eng.blr.re dhat.com N/A N/A Y 1668 Snapshot Daemon on rhs-client38.lab.eng.blr .redhat.com 49154 0 Y 18492 Self-heal Daemon on rhs-client38.lab.eng.bl r.redhat.com N/A N/A Y 19097 Quota Daemon on rhs-client38.lab.eng.blr.re dhat.com N/A N/A Y 19115 Snapshot Daemon on rhs-client25.lab.eng.blr .redhat.com 49154 0 Y 9833 Self-heal Daemon on rhs-client25.lab.eng.bl r.redhat.com N/A N/A Y 30065 Quota Daemon on rhs-client25.lab.eng.blr.re dhat.com N/A N/A Y 30076 Task Status of Volume rpcx3 ------------------------------------------------------------------------------ Task : Rebalance ID : 2cd252ed-3202-4c7f-99bd-6326058c797f Status : in progress
(In reply to Amar Tumballi from comment #14) > Few questions: > > Looks like the process got killed when the overall memory usage was 900MB ?? > > > [Mon Mar 11 17:48:50 2019] [32193] 0 32193 1956326 905728 3303 740779 0 glusterfs > > > Please note, if you want to run with that much less memory, recommended > client configuration is '-olru-limit=10000'. As I checked the inode table > details in statedump, and the new feature is already looks like it is > working fine. (limit is 128k, and number of inodes in lru-list is always > lesser). > > > Next observation, the graph which was not active, held significant memory in > its mem-pools. Some of the top contributors are: > > quick-read- ~150MB > io-cache - ~30MB > io-stats - ~30MB > > > While the inode_ctx of replicate/client-protocol/dht were also above 10s of > MBs. So, I am not able to see any issues per say with the process, but for > the given workload, the memory is not enough, or the above option should be > set. I am retrying with a 16gb baremetal client . Will update the results accordingly
Re-open when you happen to test it with 3.5.0 bits and see the behavior. For now, it is DEFERRED.
While OOM kill didn't yet happen given I am testing on a 64GB machine, I see memory spiking to almost 30GB on a 64GB machine. So I have been able to reproduce the memory leak. I have the statedumps and will be attaching. Hence proposing it back for fixing in 3.5.0 as it has not been fixed in 3.5.0 as from comment#23
Sunil, can you help expedite this issue, as i need the machines due to limited resources?
(In reply to Csaba Henk from comment #36) > Nithya, I see your point. > > In Glusterfs: > > struct _dentry { > struct list_head inode_list; /* list of dentries of inode */ > struct list_head hash; /* hash table pointers */ > inode_t *inode; /* inode of this directory entry */ > char *name; /* name of the directory entry */ > inode_t *parent; /* directory of the entry */ > }; > > In Linux kernel: > > struct dentry { > /* RCU lookup touched fields */ > unsigned int d_flags; /* protected by d_lock */ > seqcount_t d_seq; /* per dentry seqlock */ > struct hlist_bl_node d_hash; /* lookup hash list */ > struct dentry *d_parent; /* parent directory */ > ... > } > > (https://github.com/torvalds/linux/blob/v5.2/include/linux/dcache.h#L94) > > That is, in Glusterfs, parent of a dentry is an inode, while in kernel, > parent of dentry is a dentry. So in kernel the in-memory tree is laid out > purely from dentries, decoupled from inodes and their lifetime cycle. But the dentry structure definition also has a pointer to an inode: struct dentry { /* RCU lookup touched fields */ unsigned int d_flags; /* protected by d_lock */ seqcount_t d_seq; /* per dentry seqlock */ struct hlist_bl_node d_hash; /* lookup hash list */ struct dentry *d_parent; /* parent directory */ struct qstr d_name; struct inode *d_inode; /* Where the name belongs to - NULL is * negative */ So, what happens in kernel when a file in deep directory structure is looked up? 1. Are all the dentries leading the file to root present in memory? I think yes 2. Do all the dentries point to a valid inode? Is it ok to have a NULL inode to many/all of these parent dentries I think we should think whether maintaining tree based hierarchical namespace requires all the parent inodes to be present. What would happen if we don't do that? What operations on that namespace can fail? > > It could be worth to consider adopting a similar model.
Hi Nag, is this issue still actual?
I went through this bug again. What I think should be done is to continue the investigation about inode/dentry layout optimization that has been brought up in comments #36 and #47. That is something which 1) definitely should happen in the future 2) won't happen in the foreseeable future (say, current upstream release cycle).
The bug got closed accidentally in comment #56. Reopening as it's deemed to capture a relevant place for inprovement.
Having consulted with Sunil, we decided to continue the investigation of the supposed root cause in an RFE (https://github.com/gluster/glusterfs/issues/1544, "file tree memory layout optimization"), and close this bz.