1344843 – inode leak in brick process

Bug 1344843 - inode leak in brick process

Summary: inode leak in brick process

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	core
Sub Component:
Version:	rhgs-3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Raghavendra G
QA Contact:	Rahul Hinduja
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1344885 RHGS-3.4-GSS-proposed-tracker
TreeView+	depends on / blocked

Reported:	2016-06-11 16:20 UTC by Raghavendra G
Modified:	2019-11-14 08:21 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1344885 (view as bug list)
Environment:
Last Closed:	2018-02-07 04:26:32 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
brick statedump (314.08 KB, text/plain) 2016-06-11 16:24 UTC, Raghavendra G	no flags	Details
client statedump (186.29 KB, text/plain) 2016-06-11 16:26 UTC, Raghavendra G	no flags	Details
Inode leak statedump and fusedump (12.98 KB, application/x-gzip) 2016-09-15 05:06 UTC, Mohamed Ashiq	no flags	Details
The attachment is a tar of the fuse client and the two bricks of the server (330.00 KB, application/x-tar) 2017-09-21 10:27 UTC, hari gowtham	no flags	Details
View All

Description Raghavendra G 2016-06-11 16:20:33 UTC

Description of problem:
There is a leak of inodes on the brick process.

[root@unused ~]# gluster volume info
 
Volume Name: ra
Type: Distribute
Volume ID: 258a8e92-678b-41db-ba8e-b273a360297d
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: booradley:/home/export-2/ra
Options Reconfigured:
diagnostics.brick-log-level: DEBUG
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet

Script:
[root@unused mnt]# for i in {1..150}; do echo $i; cp -rf /etc . && rm -rf *; done

After completion of script, I can see active inodes in the brick itable

[root@unused ~]# grep ra.active /var/run/gluster/home-export-2-ra.19609.dump.1465647069  
conn.0.bound_xl./home/export-2/ra.active_size=149

[root@unused ~]# grep ra.active /var/run/gluster/home-export-2-ra.19609.dump.1465647069  | wc -l
150

But the client fuse mount doesn't have any inodes.
[root@unused ~]# grep active /var/run/gluster/glusterdump.20612.dump.1465629006 | grep itable
xlator.mount.fuse.itable.active_size=1
[xlator.mount.fuse.itable.active.1]

I've not done a detailed RCA. But initial gut feeling is that there is one inode leak for every iteration of the loop. The leaked inode mostly corresponds to /mnt/etc.

Version-Release number of selected component (if applicable):
RHGS-3.1.3 git repo. Bug seen on upstream master too.

How reproducible:
Quite consistently

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Raghavendra G 2016-06-11 16:24:04 UTC

Created attachment 1166901 [details]
brick statedump

Comment 3 Raghavendra G 2016-06-11 16:26:49 UTC

Created attachment 1166915 [details]
client statedump

Comment 4 Raghavendra G 2016-06-12 07:47:03 UTC

There is a bug which prevents dumping the itable of first client connected. To view leaks in itable for the test-case mentioned in this bug fix [1] is also required.
[1] http://review.gluster.org/14704

Comment 7 Raghavendra G 2016-06-14 07:15:44 UTC

On further investigation, the leak seems to be in io-threads xlator. If I switch off io-threads, the leak goes away.

Comment 11 Raghavendra G 2016-09-02 12:20:47 UTC

(In reply to Raghavendra G from comment #7)
> On further investigation, the leak seems to be in io-threads xlator. If I
> switch off io-threads, the leak goes away.

io-threads might just be introducing a race-condition which might be causing the leak (in other components). But, there is no inode-leak in io-threads xlator proper.

Comment 12 Mohamed Ashiq 2016-09-15 05:06:47 UTC

Created attachment 1201131 [details]
Inode leak statedump and fusedump

Comment 13 Mohamed Ashiq 2016-09-15 05:28:24 UTC

Hi,

I did some more test with respect to inode-leak in bricks. I have attached the statedump and fusedump. Here is what I did:

created a 2x2 volume

mounted with fusedump option enabled
# glusterfs --volfile-server=<IP ADDR> --dump-fuse=/home/dump.fdump --volfile-id=/vol /mnt/mount

# cd /mnt/mount

# for i in {1..50}; do mkdir healthy$i; cd healthy$i; echo dsfdsafsadfsad >> healthy$i; cd ../; done

# find .

# rm -rf ./*

# gluster volume statedump vol

# for i in {1..50}; do mkdir healthy$i; cd healthy$i; done

# cd /mnt/mount

# find .

# rm -rf ./*

# gluster volume statedump vol

brick/4 has inode leaks when creating directories on mount recursively and not on samelevel.

fusedump is took and parsed with https://github.com/csabahenk/parsefuse to make it human readable. fusedumps are attached.

Comment 15 hari gowtham 2017-09-21 09:02:53 UTC

with the above observation the fuse dump is missing to come to a conclusion.

I tried recreating it but was unsuccessful

In many of the runs with different types of volume this is the output of the statedump for a plain distribute volume with two bricks

brick1:
conn.0.bound_xl./data/gluster/bricks/b1.active_size=1
brick2:
conn.0.bound_xl./data/gluster/bricks/b2.active_size=1

both have the root inode alone active

the fuse dump 
xlator.mount.fuse.itable.active_size=1

things are working fine on the current master. 
So I'm closing this bug as not reproducible. 
Feel free to open it if it is see again.

Comment 18 hari gowtham 2017-09-21 10:20:05 UTC

The bug is valid in 3.3.1.

It happens once in 5 tries atleast. 

BRICKS:

ag active_size /var/run/gluster/data-gluster-bricks-b1.742.dump.1505986755 
503:conn.0.bound_xl./data/gluster/bricks/b1.active_size=3

ag active_size /var/run/gluster/data-gluster-bricks-b2.761.dump.1505986756
503:conn.0.bound_xl./data/gluster/bricks/b2.active_size=2


brick1:
[conn.0.bound_xl./data/gluster/bricks/b1.active.1]
gfid=00000000-0000-0000-0000-000000000005
nlookup=15
fd-count=0
ref=1
ia_type=2

[xlator.features.locks.v1-locks.inode]
path=/.trashcan
mandatory=0

[conn.0.bound_xl./data/gluster/bricks/b1.active.2]
gfid=65dbc91d-df91-4f74-bfc1-644e7bf3ccb6
nlookup=0
fd-count=0
ref=1
ia_type=2

[xlator.features.locks.v1-locks.inode]
path=<gfid:65dbc91d-df91-4f74-bfc1-644e7bf3ccb6>
mandatory=0

[conn.0.bound_xl./data/gluster/bricks/b1.active.3]
gfid=00000000-0000-0000-0000-000000000001
nlookup=16
fd-count=0
ref=1
ia_type=2

[xlator.features.locks.v1-locks.inode]
path=/
mandatory=0

brick2:
[conn.0.bound_xl./data/gluster/bricks/b2.active.1]
gfid=00000000-0000-0000-0000-000000000005
nlookup=14
fd-count=0
ref=1
ia_type=2

[xlator.features.locks.v1-locks.inode]                                                                   
path=/.trashcan
mandatory=0

[conn.0.bound_xl./data/gluster/bricks/b2.active.2]
gfid=00000000-0000-0000-0000-000000000001
nlookup=16
fd-count=0
ref=1
ia_type=2

[xlator.features.locks.v1-locks.inode]
path=/
mandatory=0


client:
xlator.mount.fuse.itable.active_size=2

[xlator.mount.fuse.itable.active.1]
gfid=00000000-0000-0000-0000-000000000005
nlookup=1
fd-count=0
ref=1
ia_type=2

[xlator.cluster.dht.v1-dht.inode]
layout.cnt=2
layout.preset=0
layout.gen=3

[io-cache.inode]
inode.weight=1
path=/.trashcan
uuid=00000000-0000-0000-0000-000000000005

[xlator.mount.fuse.itable.active.2]
gfid=00000000-0000-0000-0000-000000000001
nlookup=0
fd-count=0
ref=1
ia_type=2

[xlator.cluster.dht.v1-dht.inode]
layout.cnt=2
layout.preset=0
layout.gen=3

[io-cache.inode]
inode.weight=1
path=/
uuid=00000000-0000-0000-0000-000000000001


As we can see client has a value and server has 3.
Attaching the statedumps of client and server bricks

Comment 19 hari gowtham 2017-09-21 10:27:55 UTC

Created attachment 1328920 [details]
The attachment is a tar of the fuse client and the two bricks of the server

Comment 20 hari gowtham 2017-09-25 11:57:54 UTC

The bug is not reproducible in 3.12

the result from the last try is:

➜  glusterfs git:(release-3.12) ✗ ag active_size /var/run/gluster/data-gluster-bricks-b1.31183.dump.1506340455 
502:conn.0.bound_xl./data/gluster/bricks/b1.active_size=1

➜  glusterfs git:(release-3.12) ✗ ag active_size /var/run/gluster/data-gluster-bricks-b2.31203.dump.1506340456 
502:conn.0.bound_xl./data/gluster/bricks/b2.active_size=1


➜  glusterfs git:(release-3.12) ✗ ag active_size /var/run/gluster/glusterdump.31425.dump.1506340470 
432:xlator.mount.fuse.itable.active_size=1

The active_size is correct for the bricks and client.

Comment 21 Amar Tumballi 2018-02-07 04:26:32 UTC

We have noticed that the bug is not reproduced in the latest version of the product (RHGS-3.3.1+).

If the bug is still relevant and is being reproduced, feel free to reopen the bug.

Note You need to log in before you can comment on or make changes to this bug.