Bug 1755344

Summary:	glustershd.log getting flooded with "W [inode.c:1017:inode_find] (-->/usr/lib64/glusterfs/6.0/xlator/cluster/disperse.so(+0xe3f9) [0x7fd09b0543f9] -->/usr/lib64/glusterfs/6.0/xlator/cluster/disperse.so(+0xe19c) [0x7fd09b05419 TABLE NOT FOUND"
Product:	[Community] GlusterFS	Reporter:	Xavi Hernandez <jahernan>
Component:	disperse	Assignee:	Xavi Hernandez <jahernan>
Status:	CLOSED NEXTRELEASE	QA Contact:
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	mainline	CC:	amukherj, bugs, jahernan, nchilaka, rhs-bugs, storage-qa-internal
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1754790	Environment:
Last Closed:	2019-09-26 14:00:42 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1754790

Description Xavi Hernandez 2019-09-25 09:47:37 UTC

Description of problem:
------------------------
the shd log file is getting flooded with below message



[2019-09-24 05:43:48.883399] W [inode.c:1017:inode_find] (-->/usr/lib64/glusterfs/6.0/xlator/cluster/disperse.so(+0xe3f9) [0x7f3b378513f9] -->/usr/lib64/glusterfs/6.0/xlator/cluster/disperse.so(+0xe19c) [0x7f3b3785119c] -->/lib64/libglusterfs.so.0(inode_find+0x92) [0x7f3b4a748112] ) 0-test-disperse-6: table not found


Version-Release number of selected component (if applicable):

How reproducible:

seeing it consistently 

Steps to Reproduce:

Access a file while self-heal is repairing it.

Actual results:

shd log has been flooded with below log message, and even has log-rotated in just 15 hrs of time

Comment 1 Xavi Hernandez 2019-09-25 09:50:00 UTC

The problem appears when an inodelk contention notification is received by the self-heal daemon. In this case, the function that manages it (ec_upcall() in ec.c) does this:

        case GF_UPCALL_INODELK_CONTENTION:
            lc = upcall->data;
            if (strcmp(lc->domain, ec->xl->name) != 0) {
                /* The lock is not owned by EC, ignore it. */
                return _gf_true;
            }
            inode = inode_find(((xlator_t *)ec->xl->graph->top)->itable,
                               upcall->gfid);

In the case of self-heal daemon, ec->xl->graph->top corresponds to the debug/io-stats xlator, which doesn't have any inode table. In this case this is not a problem because self-heal doesn't use eager-locking, so no need to take care of inodelk contention notifications. Locks will be released as soon as possible regardless of whether there is contention or not.

Normal client mounts do have an inode table on the top xlator, so this problem is not observed there.

I'll send a patch to prevent filling the logs in this case.

Comment 2 Worker Ant 2019-09-25 10:08:24 UTC

REVIEW: https://review.gluster.org/23481 (cluster/ec: prevent filling shd log with \"table not found\" messages) posted (#1) for review on master by Xavi Hernandez

Comment 3 Worker Ant 2019-09-26 14:00:42 UTC

REVIEW: https://review.gluster.org/23481 (cluster/ec: prevent filling shd log with \"table not found\" messages) merged (#2) on master by Pranith Kumar Karampuri