Bug 1530133

Summary: Client process hogs up 27G of resident size in memory (nearly 60%) post readdirs in loop.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Ambarish <asoman>
Component: glusterfsAssignee: Raghavendra G <rgowdapp>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Bala Konda Reddy M <bmekala>
Severity: high Docs Contact:
Priority: high    
Version: rhgs-3.3CC: amukherj, kdhananj, rgowdapp, rhinduja, rhs-bugs, sabose, vbellur
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: aarch64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-14 05:21:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1647277    

Description Ambarish 2018-01-02 07:03:31 UTC
Description of problem:
------------------------

EC 2*(4+2) mounted on 6 clients with nearly 860G of data.

Ran finds and ls on FUSe mount for a day.


After 24 hours,I found the client process to be taking close to 27G of RES space in memory :


top - 01:59:56 up 3 days, 22:17,  1 user,  load average: 0.00, 0.06, 0.11
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 49279472 total, 14802152 free, 29241404 used,  5235916 buff/cache
KiB Swap: 24772604 total, 24772604 free,        0 used. 15173240 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                         
29439 root      20   0 27.529g 0.027t   5776 S   0.0 58.0   4437:57 memcheck-amd64- 



There's a chance of a leak here .(and a potential OOM kill??)                                



Version-Release number of selected component (if applicable):
-------------------------------------------------------------

glusterfs-3.8.4-52.3.el7rhgs.x86_64

How reproducible:
-----------------

1/1


Additional info:
----------------

 
Volume Name: butcher
Type: Distributed-Disperse
Volume ID: 8163a57d-bedb-4603-b4dc-887144a433d9
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (4 + 2) = 12
Transport-type: tcp
Bricks:
Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks1/brick1
Brick2: gqas016.sbu.lab.eng.bos.redhat.com:/bricks1/brick1
Brick3: gqas006.sbu.lab.eng.bos.redhat.com:/bricks1/brick1
Brick4: gqas008.sbu.lab.eng.bos.redhat.com:/bricks1/brick1
Brick5: gqas003.sbu.lab.eng.bos.redhat.com:/bricks1/brick1
Brick6: gqas007.sbu.lab.eng.bos.redhat.com:/bricks1/brick1
Brick7: gqas013.sbu.lab.eng.bos.redhat.com:/bricks1/brick2
Brick8: gqas016.sbu.lab.eng.bos.redhat.com:/bricks1/brick2
Brick9: gqas006.sbu.lab.eng.bos.redhat.com:/bricks1/brick2
Brick10: gqas008.sbu.lab.eng.bos.redhat.com:/bricks1/brick2
Brick11: gqas003.sbu.lab.eng.bos.redhat.com:/bricks1/brick2
Brick12: gqas007.sbu.lab.eng.bos.redhat.com:/bricks1/brick2
Options Reconfigured:
features.barrier: disable
features.uss: enable
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
network.inode-lru-limit: 50000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
transport.address-family: inet
nfs.disable: on
cluster.enable-shared-storage: enable
[root@gqas013 ~]#

Comment 5 Ambarish 2018-01-02 08:59:28 UTC
Just to be clear here,the workload was stopped nearly 2 hours ago,but the mem usage by the mount process fails to come down.

Comment 6 Ambarish 2018-01-03 12:34:20 UTC
Not a regression.

I could reproduce it on 3.3.0 as well.

Comment 13 Sahina Bose 2019-11-14 05:21:26 UTC
Closing as this bug has not been actively worked on for over a year. Please re-open if the issue persists with latest release.