Description of problem: ------------------------ EC 2*(4+2) mounted on 6 clients with nearly 860G of data. Ran finds and ls on FUSe mount for a day. After 24 hours,I found the client process to be taking close to 27G of RES space in memory : top - 01:59:56 up 3 days, 22:17, 1 user, load average: 0.00, 0.06, 0.11 Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 49279472 total, 14802152 free, 29241404 used, 5235916 buff/cache KiB Swap: 24772604 total, 24772604 free, 0 used. 15173240 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 29439 root 20 0 27.529g 0.027t 5776 S 0.0 58.0 4437:57 memcheck-amd64- There's a chance of a leak here .(and a potential OOM kill??) Version-Release number of selected component (if applicable): ------------------------------------------------------------- glusterfs-3.8.4-52.3.el7rhgs.x86_64 How reproducible: ----------------- 1/1 Additional info: ---------------- Volume Name: butcher Type: Distributed-Disperse Volume ID: 8163a57d-bedb-4603-b4dc-887144a433d9 Status: Started Snapshot Count: 0 Number of Bricks: 2 x (4 + 2) = 12 Transport-type: tcp Bricks: Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks1/brick1 Brick2: gqas016.sbu.lab.eng.bos.redhat.com:/bricks1/brick1 Brick3: gqas006.sbu.lab.eng.bos.redhat.com:/bricks1/brick1 Brick4: gqas008.sbu.lab.eng.bos.redhat.com:/bricks1/brick1 Brick5: gqas003.sbu.lab.eng.bos.redhat.com:/bricks1/brick1 Brick6: gqas007.sbu.lab.eng.bos.redhat.com:/bricks1/brick1 Brick7: gqas013.sbu.lab.eng.bos.redhat.com:/bricks1/brick2 Brick8: gqas016.sbu.lab.eng.bos.redhat.com:/bricks1/brick2 Brick9: gqas006.sbu.lab.eng.bos.redhat.com:/bricks1/brick2 Brick10: gqas008.sbu.lab.eng.bos.redhat.com:/bricks1/brick2 Brick11: gqas003.sbu.lab.eng.bos.redhat.com:/bricks1/brick2 Brick12: gqas007.sbu.lab.eng.bos.redhat.com:/bricks1/brick2 Options Reconfigured: features.barrier: disable features.uss: enable features.quota-deem-statfs: on features.inode-quota: on features.quota: on network.inode-lru-limit: 50000 performance.md-cache-timeout: 600 performance.cache-invalidation: on performance.stat-prefetch: on features.cache-invalidation-timeout: 600 features.cache-invalidation: on transport.address-family: inet nfs.disable: on cluster.enable-shared-storage: enable [root@gqas013 ~]#
Just to be clear here,the workload was stopped nearly 2 hours ago,but the mem usage by the mount process fails to come down.
Not a regression. I could reproduce it on 3.3.0 as well.
Closing as this bug has not been actively worked on for over a year. Please re-open if the issue persists with latest release.