Bug 1344843 - inode leak in brick process
Summary: inode leak in brick process
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: core
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Raghavendra G
QA Contact: Rahul Hinduja
URL:
Whiteboard:
Depends On:
Blocks: 1344885 RHGS-3.4-GSS-proposed-tracker
TreeView+ depends on / blocked
 
Reported: 2016-06-11 16:20 UTC by Raghavendra G
Modified: 2019-11-14 08:21 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1344885 (view as bug list)
Environment:
Last Closed: 2018-02-07 04:26:32 UTC
Target Upstream Version:


Attachments (Terms of Use)
brick statedump (314.08 KB, text/plain)
2016-06-11 16:24 UTC, Raghavendra G
no flags Details
client statedump (186.29 KB, text/plain)
2016-06-11 16:26 UTC, Raghavendra G
no flags Details
Inode leak statedump and fusedump (12.98 KB, application/x-gzip)
2016-09-15 05:06 UTC, Mohamed Ashiq
no flags Details
The attachment is a tar of the fuse client and the two bricks of the server (330.00 KB, application/x-tar)
2017-09-21 10:27 UTC, hari gowtham
no flags Details

Description Raghavendra G 2016-06-11 16:20:33 UTC
Description of problem:
There is a leak of inodes on the brick process.

[root@unused ~]# gluster volume info
 
Volume Name: ra
Type: Distribute
Volume ID: 258a8e92-678b-41db-ba8e-b273a360297d
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: booradley:/home/export-2/ra
Options Reconfigured:
diagnostics.brick-log-level: DEBUG
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet

Script:
[root@unused mnt]# for i in {1..150}; do echo $i; cp -rf /etc . && rm -rf *; done

After completion of script, I can see active inodes in the brick itable

[root@unused ~]# grep ra.active /var/run/gluster/home-export-2-ra.19609.dump.1465647069  
conn.0.bound_xl./home/export-2/ra.active_size=149

[root@unused ~]# grep ra.active /var/run/gluster/home-export-2-ra.19609.dump.1465647069  | wc -l
150

But the client fuse mount doesn't have any inodes.
[root@unused ~]# grep active /var/run/gluster/glusterdump.20612.dump.1465629006 | grep itable
xlator.mount.fuse.itable.active_size=1
[xlator.mount.fuse.itable.active.1]

I've not done a detailed RCA. But initial gut feeling is that there is one inode leak for every iteration of the loop. The leaked inode mostly corresponds to /mnt/etc.

Version-Release number of selected component (if applicable):
RHGS-3.1.3 git repo. Bug seen on upstream master too.

How reproducible:
Quite consistently

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Raghavendra G 2016-06-11 16:24:04 UTC
Created attachment 1166901 [details]
brick statedump

Comment 3 Raghavendra G 2016-06-11 16:26:49 UTC
Created attachment 1166915 [details]
client statedump

Comment 4 Raghavendra G 2016-06-12 07:47:03 UTC
There is a bug which prevents dumping the itable of first client connected. To view leaks in itable for the test-case mentioned in this bug fix [1] is also required.
[1] http://review.gluster.org/14704

Comment 7 Raghavendra G 2016-06-14 07:15:44 UTC
On further investigation, the leak seems to be in io-threads xlator. If I switch off io-threads, the leak goes away.

Comment 11 Raghavendra G 2016-09-02 12:20:47 UTC
(In reply to Raghavendra G from comment #7)
> On further investigation, the leak seems to be in io-threads xlator. If I
> switch off io-threads, the leak goes away.

io-threads might just be introducing a race-condition which might be causing the leak (in other components). But, there is no inode-leak in io-threads xlator proper.

Comment 12 Mohamed Ashiq 2016-09-15 05:06:47 UTC
Created attachment 1201131 [details]
Inode leak statedump and fusedump

Comment 13 Mohamed Ashiq 2016-09-15 05:28:24 UTC
Hi,

I did some more test with respect to inode-leak in bricks. I have attached the statedump and fusedump. Here is what I did:

created a 2x2 volume

mounted with fusedump option enabled
# glusterfs --volfile-server=<IP ADDR> --dump-fuse=/home/dump.fdump --volfile-id=/vol /mnt/mount

# cd /mnt/mount

# for i in {1..50}; do mkdir healthy$i; cd healthy$i; echo dsfdsafsadfsad >> healthy$i; cd ../; done

# find .

# rm -rf ./*

# gluster volume statedump vol

# for i in {1..50}; do mkdir healthy$i; cd healthy$i; done

# cd /mnt/mount

# find .

# rm -rf ./*

# gluster volume statedump vol

brick/4 has inode leaks when creating directories on mount recursively and not on samelevel.

fusedump is took and parsed with https://github.com/csabahenk/parsefuse to make it human readable. fusedumps are attached.

Comment 15 hari gowtham 2017-09-21 09:02:53 UTC
with the above observation the fuse dump is missing to come to a conclusion.

I tried recreating it but was unsuccessful

In many of the runs with different types of volume this is the output of the statedump for a plain distribute volume with two bricks

brick1:
conn.0.bound_xl./data/gluster/bricks/b1.active_size=1
brick2:
conn.0.bound_xl./data/gluster/bricks/b2.active_size=1

both have the root inode alone active

the fuse dump 
xlator.mount.fuse.itable.active_size=1

things are working fine on the current master. 
So I'm closing this bug as not reproducible. 
Feel free to open it if it is see again.

Comment 18 hari gowtham 2017-09-21 10:20:05 UTC
The bug is valid in 3.3.1.

It happens once in 5 tries atleast. 

BRICKS:

ag active_size /var/run/gluster/data-gluster-bricks-b1.742.dump.1505986755 
503:conn.0.bound_xl./data/gluster/bricks/b1.active_size=3

ag active_size /var/run/gluster/data-gluster-bricks-b2.761.dump.1505986756
503:conn.0.bound_xl./data/gluster/bricks/b2.active_size=2


brick1:
[conn.0.bound_xl./data/gluster/bricks/b1.active.1]
gfid=00000000-0000-0000-0000-000000000005
nlookup=15
fd-count=0
ref=1
ia_type=2

[xlator.features.locks.v1-locks.inode]
path=/.trashcan
mandatory=0

[conn.0.bound_xl./data/gluster/bricks/b1.active.2]
gfid=65dbc91d-df91-4f74-bfc1-644e7bf3ccb6
nlookup=0
fd-count=0
ref=1
ia_type=2

[xlator.features.locks.v1-locks.inode]
path=<gfid:65dbc91d-df91-4f74-bfc1-644e7bf3ccb6>
mandatory=0

[conn.0.bound_xl./data/gluster/bricks/b1.active.3]
gfid=00000000-0000-0000-0000-000000000001
nlookup=16
fd-count=0
ref=1
ia_type=2

[xlator.features.locks.v1-locks.inode]
path=/
mandatory=0

brick2:
[conn.0.bound_xl./data/gluster/bricks/b2.active.1]
gfid=00000000-0000-0000-0000-000000000005
nlookup=14
fd-count=0
ref=1
ia_type=2

[xlator.features.locks.v1-locks.inode]                                                                   
path=/.trashcan
mandatory=0

[conn.0.bound_xl./data/gluster/bricks/b2.active.2]
gfid=00000000-0000-0000-0000-000000000001
nlookup=16
fd-count=0
ref=1
ia_type=2

[xlator.features.locks.v1-locks.inode]
path=/
mandatory=0


client:
xlator.mount.fuse.itable.active_size=2

[xlator.mount.fuse.itable.active.1]
gfid=00000000-0000-0000-0000-000000000005
nlookup=1
fd-count=0
ref=1
ia_type=2

[xlator.cluster.dht.v1-dht.inode]
layout.cnt=2
layout.preset=0
layout.gen=3

[io-cache.inode]
inode.weight=1
path=/.trashcan
uuid=00000000-0000-0000-0000-000000000005

[xlator.mount.fuse.itable.active.2]
gfid=00000000-0000-0000-0000-000000000001
nlookup=0
fd-count=0
ref=1
ia_type=2

[xlator.cluster.dht.v1-dht.inode]
layout.cnt=2
layout.preset=0
layout.gen=3

[io-cache.inode]
inode.weight=1
path=/
uuid=00000000-0000-0000-0000-000000000001


As we can see client has a value and server has 3.
Attaching the statedumps of client and server bricks

Comment 19 hari gowtham 2017-09-21 10:27:55 UTC
Created attachment 1328920 [details]
The attachment is a tar of the fuse client and the two bricks of the server

Comment 20 hari gowtham 2017-09-25 11:57:54 UTC
The bug is not reproducible in 3.12

the result from the last try is:

➜  glusterfs git:(release-3.12) ✗ ag active_size /var/run/gluster/data-gluster-bricks-b1.31183.dump.1506340455 
502:conn.0.bound_xl./data/gluster/bricks/b1.active_size=1

➜  glusterfs git:(release-3.12) ✗ ag active_size /var/run/gluster/data-gluster-bricks-b2.31203.dump.1506340456 
502:conn.0.bound_xl./data/gluster/bricks/b2.active_size=1


➜  glusterfs git:(release-3.12) ✗ ag active_size /var/run/gluster/glusterdump.31425.dump.1506340470 
432:xlator.mount.fuse.itable.active_size=1

The active_size is correct for the bricks and client.

Comment 21 Amar Tumballi 2018-02-07 04:26:32 UTC
We have noticed that the bug is not reproduced in the latest version of the product (RHGS-3.3.1+).

If the bug is still relevant and is being reproduced, feel free to reopen the bug.


Note You need to log in before you can comment on or make changes to this bug.