Bug 1530133

Summary:	Client process hogs up 27G of resident size in memory (nearly 60%) post readdirs in loop.
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Ambarish <asoman>
Component:	glusterfs	Assignee:	Raghavendra G <rgowdapp>
Status:	CLOSED INSUFFICIENT_DATA	QA Contact:	Bala Konda Reddy M <bmekala>
Severity:	high	Docs Contact:
Priority:	high
Version:	rhgs-3.3	CC:	amukherj, kdhananj, rgowdapp, rhinduja, rhs-bugs, sabose, vbellur
Target Milestone:	---	Keywords:	ZStream
Target Release:	---
Hardware:	aarch64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-11-14 05:21:26 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1647277

Description Ambarish 2018-01-02 07:03:31 UTC

Description of problem:
------------------------

EC 2*(4+2) mounted on 6 clients with nearly 860G of data.

Ran finds and ls on FUSe mount for a day.


After 24 hours,I found the client process to be taking close to 27G of RES space in memory :


top - 01:59:56 up 3 days, 22:17,  1 user,  load average: 0.00, 0.06, 0.11
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 49279472 total, 14802152 free, 29241404 used,  5235916 buff/cache
KiB Swap: 24772604 total, 24772604 free,        0 used. 15173240 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                         
29439 root      20   0 27.529g 0.027t   5776 S   0.0 58.0   4437:57 memcheck-amd64- 



There's a chance of a leak here .(and a potential OOM kill??)                                



Version-Release number of selected component (if applicable):
-------------------------------------------------------------

glusterfs-3.8.4-52.3.el7rhgs.x86_64

How reproducible:
-----------------

1/1


Additional info:
----------------

 
Volume Name: butcher
Type: Distributed-Disperse
Volume ID: 8163a57d-bedb-4603-b4dc-887144a433d9
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (4 + 2) = 12
Transport-type: tcp
Bricks:
Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks1/brick1
Brick2: gqas016.sbu.lab.eng.bos.redhat.com:/bricks1/brick1
Brick3: gqas006.sbu.lab.eng.bos.redhat.com:/bricks1/brick1
Brick4: gqas008.sbu.lab.eng.bos.redhat.com:/bricks1/brick1
Brick5: gqas003.sbu.lab.eng.bos.redhat.com:/bricks1/brick1
Brick6: gqas007.sbu.lab.eng.bos.redhat.com:/bricks1/brick1
Brick7: gqas013.sbu.lab.eng.bos.redhat.com:/bricks1/brick2
Brick8: gqas016.sbu.lab.eng.bos.redhat.com:/bricks1/brick2
Brick9: gqas006.sbu.lab.eng.bos.redhat.com:/bricks1/brick2
Brick10: gqas008.sbu.lab.eng.bos.redhat.com:/bricks1/brick2
Brick11: gqas003.sbu.lab.eng.bos.redhat.com:/bricks1/brick2
Brick12: gqas007.sbu.lab.eng.bos.redhat.com:/bricks1/brick2
Options Reconfigured:
features.barrier: disable
features.uss: enable
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
network.inode-lru-limit: 50000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
transport.address-family: inet
nfs.disable: on
cluster.enable-shared-storage: enable
[root@gqas013 ~]#

Comment 5 Ambarish 2018-01-02 08:59:28 UTC

Just to be clear here,the workload was stopped nearly 2 hours ago,but the mem usage by the mount process fails to come down.

Comment 6 Ambarish 2018-01-03 12:34:20 UTC

Not a regression.

I could reproduce it on 3.3.0 as well.

Comment 13 Sahina Bose 2019-11-14 05:21:26 UTC

Closing as this bug has not been actively worked on for over a year. Please re-open if the issue persists with latest release.