1530133 – Client process hogs up 27G of resident size in memory (nearly 60%) post readdirs in loop.

Bug 1530133 - Client process hogs up 27G of resident size in memory (nearly 60%) post readdirs in loop.

Summary: Client process hogs up 27G of resident size in memory (nearly 60%) post readd...

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	rhgs-3.3
Hardware:	aarch64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Raghavendra G
QA Contact:	Bala Konda Reddy M
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	RHGS34MemoryLeak
TreeView+	depends on / blocked

Reported:	2018-01-02 07:03 UTC by Ambarish
Modified:	2019-11-14 05:21 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-11-14 05:21:26 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Ambarish 2018-01-02 07:03:31 UTC

Description of problem:
------------------------

EC 2*(4+2) mounted on 6 clients with nearly 860G of data.

Ran finds and ls on FUSe mount for a day.


After 24 hours,I found the client process to be taking close to 27G of RES space in memory :


top - 01:59:56 up 3 days, 22:17,  1 user,  load average: 0.00, 0.06, 0.11
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 49279472 total, 14802152 free, 29241404 used,  5235916 buff/cache
KiB Swap: 24772604 total, 24772604 free,        0 used. 15173240 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                         
29439 root      20   0 27.529g 0.027t   5776 S   0.0 58.0   4437:57 memcheck-amd64- 



There's a chance of a leak here .(and a potential OOM kill??)                                



Version-Release number of selected component (if applicable):
-------------------------------------------------------------

glusterfs-3.8.4-52.3.el7rhgs.x86_64

How reproducible:
-----------------

1/1


Additional info:
----------------

 
Volume Name: butcher
Type: Distributed-Disperse
Volume ID: 8163a57d-bedb-4603-b4dc-887144a433d9
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (4 + 2) = 12
Transport-type: tcp
Bricks:
Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks1/brick1
Brick2: gqas016.sbu.lab.eng.bos.redhat.com:/bricks1/brick1
Brick3: gqas006.sbu.lab.eng.bos.redhat.com:/bricks1/brick1
Brick4: gqas008.sbu.lab.eng.bos.redhat.com:/bricks1/brick1
Brick5: gqas003.sbu.lab.eng.bos.redhat.com:/bricks1/brick1
Brick6: gqas007.sbu.lab.eng.bos.redhat.com:/bricks1/brick1
Brick7: gqas013.sbu.lab.eng.bos.redhat.com:/bricks1/brick2
Brick8: gqas016.sbu.lab.eng.bos.redhat.com:/bricks1/brick2
Brick9: gqas006.sbu.lab.eng.bos.redhat.com:/bricks1/brick2
Brick10: gqas008.sbu.lab.eng.bos.redhat.com:/bricks1/brick2
Brick11: gqas003.sbu.lab.eng.bos.redhat.com:/bricks1/brick2
Brick12: gqas007.sbu.lab.eng.bos.redhat.com:/bricks1/brick2
Options Reconfigured:
features.barrier: disable
features.uss: enable
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
network.inode-lru-limit: 50000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
transport.address-family: inet
nfs.disable: on
cluster.enable-shared-storage: enable
[root@gqas013 ~]#

Comment 5 Ambarish 2018-01-02 08:59:28 UTC

Just to be clear here,the workload was stopped nearly 2 hours ago,but the mem usage by the mount process fails to come down.

Comment 6 Ambarish 2018-01-03 12:34:20 UTC

Not a regression.

I could reproduce it on 3.3.0 as well.

Comment 13 Sahina Bose 2019-11-14 05:21:26 UTC

Closing as this bug has not been actively worked on for over a year. Please re-open if the issue persists with latest release.

Note You need to log in before you can comment on or make changes to this bug.