Bug 1335373
Summary: | cyclic dentry loop in inode table | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Raghavendra G <rgowdapp> |
Component: | fuse | Assignee: | bugs <bugs> |
Status: | CLOSED UPSTREAM | QA Contact: | |
Severity: | low | Docs Contact: | |
Priority: | low | ||
Version: | mainline | CC: | bugs, csaba, khiremat, kiyer |
Target Milestone: | --- | Keywords: | Reopened, Triaged |
Target Release: | --- | Flags: | khiremat:
needinfo-
|
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-03-12 12:51:12 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1092606 | ||
Bug Blocks: |
Description
Raghavendra G
2016-05-12 06:31:09 UTC
http://review.gluster.org/#/c/13451 is one way of solving this issue. DFS should be loaded as an immediate child of these access layers. With the dentry operations being serialized, we don't see the race outlined above. Other problems due to stale dentries for a directory are discussed in this mail thread: http://comments.gmane.org/gmane.comp.file-systems.gluster.devel/15287 Setting to POST in acknowledgement of https://review.gluster.org/13451 as an attempt to solve this. This can be resolved by `gluster volume set $VOL features.sdfs enable` (the above patch finally made it to repo). (In reply to Amar Tumballi from comment #4) > This can be resolved by `gluster volume set $VOL features.sdfs enable` (the > above patch finally made it to repo). But doesn't enabling sdfs regresses performance significantly and hence not a viable solution? Note that we have run into this problem even in client's inode table. Serializing on client makes glusterfs almost useless due to serious performance drop. So, even though sdfs is available (that too only on bricks not on client, but this problem exists on client too), in its current form it cannot be used practically and hence I would say this bug is not fixed. (In reply to Raghavendra G from comment #5) > (In reply to Amar Tumballi from comment #4) > > This can be resolved by `gluster volume set $VOL features.sdfs enable` (the > > above patch finally made it to repo). > > But doesn't enabling sdfs regresses performance significantly and hence not > a viable solution? commit 829337ed3971a53086f1562d826e79d4f3e3ed39 Author: Amar Tumballi <amarts> Date: Mon Jan 28 18:30:24 2019 +0530 features/sdfs: disable by default With the feature enabled, some of the performance testing results, specially those which create millions of small files, got approximately 4x regression compared to version before enabling this. On master without this patch: 765 creates/sec On master with this patch : 3380 creates/sec Also there seems to be regression caused by this in 'ls -l' workload. On master without this patch: 3030 files/sec On master with this patch : 16610 files/sec This is a feature added to handle multiple clients parallely operating (specially those which race for file creates with same name) on a single namespace/directory. Considering that is < 3% of Gluster's usecase right now, it makes sense to disable the feature by default, so we don't penalize the default users who doesn't bother about this usecase. Also note that the client side translators, specially, distribute, replicate and disperse already handle the issue upto 99.5% of the cases without SDFS, so it makes sense to keep the feature disabled by default. Credits: Shyamsunder <srangana> for running the tests and getting the numbers. Change-Id: Iec49ce1d82e621e9db25eb633fcb1d932e74f4fc Updates: bz#1670031 Signed-off-by: Amar Tumballi <amarts> Moving back this to assigned till the discussion about perf impact of sdfs is resolved. Another data point for this bug is cyclic dentry loops can cause *serious* performance regression in lookup codepath. Especially if cyclic loops are formed relatively deep in directory hierarchy as they increase the dentry search time exponentially (2 pow number-of-duplicate-dentries). So, this bug is correctness as well as a performance one. And I've seen the issue in one of production environments (though we couldn't exactly measure perf impact of this in this setup, but perf impact was seen in test setups) - https://bugzilla.redhat.com/show_bug.cgi?id=1696353#c11 s/cyclic dentry loops/stale dentries. What I've said about performance in previous comments is still valid. Raghavendra, Yes, SDFS has performance impact. But the feature is still present and can be enabled if user's work load demands it. My reasoning of closing this bug was mainly because for none of our users, this particular case was not hit (ie, there are no reports of it). So, focusing on that work when no one needs it is meaningless, that too when there are options available. Stale dentries on client itable can end up making both src and dst exist after a "mv src dst" succeeds and they resolve to same inode. From one of the production setups using SAS: md-cache doesn't resolve a (parent, basename) pair to an inode. Instead its the access layers (fuse, gfapi) that resolve the path to an inode. md-cache just gives back the stat stored in the context of inode. Which means both, * (Nodeid:139842860710128, dm_errori.sas7bdat) * (Nodeid:139842860710128, dm_errori.sas7bdat.lck) are resolving to same inode. Since they resolve to same inode, lookup is served from same cache and hence identical stats. When md-cache is turned off, both lookups go on same inode, but server-resolver on brick ignores the gfid sent by client. Instead it resolves the entry freshly and hence will get a different inode for each lookup. So the actual problem is in fuse inode table, where there are two dentries for the same inode. How did we end up in that situation? Its likely that a lookup (Nodeid:139842860710128, dm_errori.sas7bdat.lck) was racing with rename (dm_errori.sas7bdat, dm_errori.sas7bdat.lck) and rename updated inode table on client first. The lookup which hit storage/posix before rename relinked the stale dentry. Without lookups reaching bricks, client never got a chance to flush its stale dentry. The problem is exactly the same DFS is trying to solve on brick stack. So, the role of stat-prefetch here is that it is preventing lookups from reaching bricks. Once lookups reach bricks, stale dentry from inode table will be purged and the error condition goes away. (In reply to Amar Tumballi from comment #10) > Raghavendra, Yes, SDFS has performance impact. But the feature is still > present and can be enabled if user's work load demands it. > > My reasoning of closing this bug was mainly because for none of our users, > this particular case was not hit (ie, there are no reports of it). So, > focusing on that work when no one needs it is meaningless, that too when > there are options available. The second claim about no user has hit it is wrong. To repost the link I posted in comment #8 - https://bugzilla.redhat.com/show_bug.cgi?id=1696353#c11. Also I've found this scenario on SAS setups too - https://bugzilla.redhat.com/show_bug.cgi?id=1581306#c47 I've already reasoned about non-feasibility of sdfs: 1. Its not available on client. There is no option which can load it in client graphs. It can be loaded only on bricks. So, even if the user can accept perf impact (which is highly unlikely, see point 2) the solution is not complete. 2. The commit I posted in comment #6 measured the impact of sdfs being loaded on brick stack. We don't have any perf data to measure the impact if it gets loaded on client graph. It's highly likely that perf impact is much greater than the numbers posted in comment #6 All it takes to hit this bug is rename heavy workloads and there are many of them as the following paradigm is pretty common while copying a file (rsync, postgresql, SAS etc to give specific examples): * create a tmp file * write to it * rename tmp file to actual file (In reply to Raghavendra G from comment #12) > (In reply to Amar Tumballi from comment #10) > > Raghavendra, Yes, SDFS has performance impact. But the feature is still > > present and can be enabled if user's work load demands it. > > > > My reasoning of closing this bug was mainly because for none of our users, > > this particular case was not hit (ie, there are no reports of it). So, > > focusing on that work when no one needs it is meaningless, that too when > > there are options available. > > The second claim about no user has hit it is wrong. To repost the link I > posted in comment #8 - > https://bugzilla.redhat.com/show_bug.cgi?id=1696353#c11. Also I've found > this scenario on SAS setups too - > https://bugzilla.redhat.com/show_bug.cgi?id=1581306#c47 Also, https://bugzilla.redhat.com/show_bug.cgi?id=1600923 on geo-rep setups. IIRC, this bug when hit affects Geo-rep. Putting needinfo on Kotresh and Rochelle to explain how this bug, affects geo-rep. This bug is moved to https://github.com/gluster/glusterfs/issues/962, and will be tracked there from now on. Visit GitHub issues URL for further details |