Bug 1755344

Summary: glustershd.log getting flooded with "W [inode.c:1017:inode_find] (-->/usr/lib64/glusterfs/6.0/xlator/cluster/disperse.so(+0xe3f9) [0x7fd09b0543f9] -->/usr/lib64/glusterfs/6.0/xlator/cluster/disperse.so(+0xe19c) [0x7fd09b05419 TABLE NOT FOUND"
Product: [Community] GlusterFS Reporter: Xavi Hernandez <jahernan>
Component: disperseAssignee: Xavi Hernandez <jahernan>
Status: CLOSED NEXTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: mainlineCC: amukherj, bugs, jahernan, nchilaka, rhs-bugs, storage-qa-internal
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1754790 Environment:
Last Closed: 2019-09-26 14:00:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1754790    

Description Xavi Hernandez 2019-09-25 09:47:37 UTC
Description of problem:
------------------------
the shd log file is getting flooded with below message



[2019-09-24 05:43:48.883399] W [inode.c:1017:inode_find] (-->/usr/lib64/glusterfs/6.0/xlator/cluster/disperse.so(+0xe3f9) [0x7f3b378513f9] -->/usr/lib64/glusterfs/6.0/xlator/cluster/disperse.so(+0xe19c) [0x7f3b3785119c] -->/lib64/libglusterfs.so.0(inode_find+0x92) [0x7f3b4a748112] ) 0-test-disperse-6: table not found


Version-Release number of selected component (if applicable):

How reproducible:

seeing it consistently 

Steps to Reproduce:

Access a file while self-heal is repairing it.

Actual results:

shd log has been flooded with below log message, and even has log-rotated in just 15 hrs of time

Comment 1 Xavi Hernandez 2019-09-25 09:50:00 UTC
The problem appears when an inodelk contention notification is received by the self-heal daemon. In this case, the function that manages it (ec_upcall() in ec.c) does this:

        case GF_UPCALL_INODELK_CONTENTION:
            lc = upcall->data;
            if (strcmp(lc->domain, ec->xl->name) != 0) {
                /* The lock is not owned by EC, ignore it. */
                return _gf_true;
            }
            inode = inode_find(((xlator_t *)ec->xl->graph->top)->itable,
                               upcall->gfid);

In the case of self-heal daemon, ec->xl->graph->top corresponds to the debug/io-stats xlator, which doesn't have any inode table. In this case this is not a problem because self-heal doesn't use eager-locking, so no need to take care of inodelk contention notifications. Locks will be released as soon as possible regardless of whether there is contention or not.

Normal client mounts do have an inode table on the top xlator, so this problem is not observed there.

I'll send a patch to prevent filling the logs in this case.

Comment 2 Worker Ant 2019-09-25 10:08:24 UTC
REVIEW: https://review.gluster.org/23481 (cluster/ec: prevent filling shd log with \"table not found\" messages) posted (#1) for review on master by Xavi Hernandez

Comment 3 Worker Ant 2019-09-26 14:00:42 UTC
REVIEW: https://review.gluster.org/23481 (cluster/ec: prevent filling shd log with \"table not found\" messages) merged (#2) on master by Pranith Kumar Karampuri