Description of problem:
The current xlator graph has write-behind as a child of md-cache. When writes are cached, write-behind returns NULL values for stats. So, a write heavy workload essentially removes stats from cache always rendering md-cache useless.
If we load md-cache as a child of write-behind, write cbk will have stats from bricks and hence cache will be updated with latest stat in write workloads.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
REVIEW: https://review.gluster.org/22124 (performance/md-cache: load as a child of write-behind) posted (#1) for review on master by Raghavendra G
Is this a blocker to release-6? Can we please re-evaluate?
Do we want this in or shall we close it?
(In reply to Yaniv Kaul from comment #3)
> Do we want this in or shall we close it?
I'm not sure about the importance of this issue. I think we can close it.
(In reply to Sanju from comment #4)
> (In reply to Yaniv Kaul from comment #3)
> > Do we want this in or shall we close it?
> I'm not sure about the importance of this issue. I think we can close it.
From the patch description:
"This benefits write workload as md-cache can absorb fstats calls from kernel."
I'd like to understand better if there is a benefit here. I'm re-running the regressions tests, to see at least if it's stable first.
The main issue here is that write-behind is returning NULL for post iatt after a cached write. When md-cache sees a NULL iatt, it invalidates its cache. This happens because write-behind is not caching metadata, so it can't provide a meaningful iatt when the write has not been processed by bricks.
On the other side, placing md-cache after write-behind requires that write-behind serializes all operations that return an iatt (virtually all) so that when they reach md-cache (at least those that the user has sent sequentially and write-behind has answered directly), the previous operations have been completely executed and updated the cached metadata. I'm not 100% sure if this is what write-behind is doing in all cases, but doing so is inefficient. For example, if user sends a write request and then an fstat request, write-behind will absorb the write, but when it sees the fstat request, the cached write needs to be flushed and the fstat delayed until the write finishes so that the fstat gets the correct answer.
I think the good approach here would be to unify caching layers and provide an authoritative cache with consistency guarantees. There's a github issue  for this, but probably it's a very ambitious approach for a first implementation. Maybe we could get most of the benefits by using a lock based cache that integrates metadata and data. With this approach, an fstat after a write could be served even without flushing the cached write, so both requests could be served directly from client cache with no network/brick activity.
If that's the way to go, I would close this bug and create a new one (or a feature request in github) to implement that caching.
What do you think ?
I'd close and leave the feature request in githbub. This one certainly looks like a sizable project - more than we should commit to right now.
+1 on moving the discussion to github with enhancement tag. Lets get the discussion going there, and see when and 'who' can pick this up.
Based on latest comments I'm closing this bug and I've updated the github issue to work on these lines.