Bug 1330881 - Inode leaks found in data-self-heal
Summary: Inode leaks found in data-self-heal
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: replicate
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: RHGS 3.1.3
Assignee: Pranith Kumar K
QA Contact: Nag Pavan Chilakam
URL:
Whiteboard:
Depends On: 1329773
Blocks: 1311817 1329779
TreeView+ depends on / blocked
 
Reported: 2016-04-27 08:38 UTC by Pranith Kumar K
Modified: 2016-09-17 12:15 UTC (History)
7 users (show)

Fixed In Version: glusterfs-3.7.9-3
Doc Type: Bug Fix
Doc Text:
Clone Of: 1329773
Environment:
Last Closed: 2016-06-23 05:19:54 UTC
Embargoed:


Attachments (Terms of Use)
qe validation logs (12.81 MB, application/x-tar)
2016-05-06 07:00 UTC, Nag Pavan Chilakam
no flags Details
qe validation logs#2 (12.77 MB, application/x-tar)
2016-05-06 07:02 UTC, Nag Pavan Chilakam
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1240 0 normal SHIPPED_LIVE Red Hat Gluster Storage 3.1 Update 3 2016-06-23 08:51:28 UTC

Description Pranith Kumar K 2016-04-27 08:38:27 UTC
+++ This bug was initially created as a clone of Bug #1329773 +++

Description of problem:
Olia found inode leak in the review @ https://github.com/gluster/glusterfs/commit/b8106d1127f034ffa88b5dd322c23a10e023b9b6

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Vijay Bellur on 2016-04-22 20:11:53 EDT ---

REVIEW: http://review.gluster.org/14052 (cluster/afr: Fix inode-leak in data self-heal) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Vijay Bellur on 2016-04-22 22:15:09 EDT ---

REVIEW: http://review.gluster.org/14052 (cluster/afr: Fix inode-leak in data self-heal) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Vijay Bellur on 2016-04-23 21:55:32 EDT ---

REVIEW: http://review.gluster.org/14052 (cluster/afr: Fix inode-leak in data self-heal) posted (#3) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Vijay Bellur on 2016-04-25 02:17:46 EDT ---

COMMIT: http://review.gluster.org/14052 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit 13e458cd70ac1943cf68d95a2c6517663626c64a
Author: Pranith Kumar K <pkarampu>
Date:   Sat Apr 23 05:30:08 2016 +0530

    cluster/afr: Fix inode-leak in data self-heal
    
    Thanks to Olia-Kremmyda for finding the bug on github review,
    https://github.com/gluster/glusterfs/commit/b8106d1127f034ffa88b5dd322c23a10e023b9b6
    
    Change-Id: Ib8640ed0c331a635971d5d12052f0959c24f76a2
    BUG: 1329773
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/14052
    Smoke: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>
    Reviewed-by: Ravishankar N <ravishankar>
    Reviewed-by: Krutika Dhananjay <kdhananj>

Comment 8 Nag Pavan Chilakam 2016-05-06 06:54:29 UTC
As there is no specific way of testing this functionally.
Talked with dev and did below testing.
Moving to verified with the available information


    Had a 6 node setup

    created 1x2 volume

    fuse mnted the volume and created some files and copied linux tar

    brought down brick-0

    from two different mounts did following IOs

    scped video files from my laptop to the volume mount

    untarred kernel

    In all had about 40GB of data to heal

    Started the heal by bringing up the brick using force start. 

    While heal was going on continously(with sleep of 10s) issued heal info

    Also from one mount in a loop used dd to create 400MB size files as long as heal is happening(crreated about 42 files)

    the heal took about 1Hour to complete

    I noticed the shd process memory consumption of both the bricks and didnt see any change in consumption. On an avg the cpu consumption from top command showed 1.1% usage

    I noticed the shd process cpu usage of the source was b/w 50-90% during the healing
In 1x2 Volume:

Ran healinfo in loop while actual heal was going on
to test 1330881 - Inode leaks found in data-self-heal 
there was about 40GB of data to be healed(one untarred lin kernel and others being folders containing many big video files



took about 1 Hr to heal complete data:
##############################
[root@dhcp35-191 ~]# gluster v status olia
Status of volume: olia
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.98:/rhs/brick4/olia          N/A       N/A        N       N/A  
Brick 10.70.35.64:/rhs/brick4/olia          49170     0          Y       6547 
NFS Server on localhost                     2049      0          Y       20318
Self-heal Daemon on localhost               N/A       N/A        Y       20326
NFS Server on 10.70.35.27                   2049      0          Y       5753 
Self-heal Daemon on 10.70.35.27             N/A       N/A        Y       5761 
NFS Server on 10.70.35.114                  2049      0          Y       5869 
Self-heal Daemon on 10.70.35.114            N/A       N/A        Y       5877 
NFS Server on 10.70.35.44                   2049      0          Y       32066
Self-heal Daemon on 10.70.35.44             N/A       N/A        Y       32074
NFS Server on 10.70.35.98                   2049      0          Y       4823 
Self-heal Daemon on 10.70.35.98             N/A       N/A        Y       4832 
NFS Server on 10.70.35.64                   2049      0          Y       6574 
Self-heal Daemon on 10.70.35.64             N/A       N/A        Y       6583 
 
Task Status of Volume olia
------------------------------------------------------------------------------
There are no active volume tasks


 
Volume Name: olia
Type: Replicate
Volume ID: 6ad242d2-b0cb-441d-97a7-8fa2db693e05
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.70.35.98:/rhs/brick4/olia
Brick2: 10.70.35.64:/rhs/brick4/olia
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
performance.readdir-ahead: on



(turned on profiling)

Comment 9 Nag Pavan Chilakam 2016-05-06 07:00:06 UTC
Created attachment 1154497 [details]
qe validation logs

Comment 10 Nag Pavan Chilakam 2016-05-06 07:02:48 UTC
Created attachment 1154498 [details]
qe validation logs#2

Comment 12 errata-xmlrpc 2016-06-23 05:19:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240


Note You need to log in before you can comment on or make changes to this bug.