+++ This bug was initially created as a clone of Bug #1438706 +++ +++ This bug was initially created as a clone of Bug #1436739 +++ Description of problem: As per Sanjay Rao's inputs, there was a performance drop in random reads fio workload when run through vms hosted on sharded volumes. Volume profile indicated a big difference between the number of lookups sent by FUSE and number of lookups received by individual bricks. Through code reading, it was found that there is a performance bug in shard which was causing the translator to trigger unusually high number of lookups for cache invalidation even when there was no modification to the file. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: --- Additional comment from Worker Ant on 2017-03-28 10:23:33 EDT --- REVIEW: https://review.gluster.org/16961 (features/shard: Pass the correct iatt for cache invalidation) posted (#1) for review on master by Krutika Dhananjay (kdhananj) --- Additional comment from Worker Ant on 2017-03-30 01:48:38 EDT --- REVIEW: https://review.gluster.org/16961 (features/shard: Pass the correct iatt for cache invalidation) posted (#2) for review on master by Krutika Dhananjay (kdhananj) --- Additional comment from Worker Ant on 2017-03-30 02:15:59 EDT --- COMMIT: https://review.gluster.org/16961 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit dd5ada1f11d76b4c55c7c55d23718617f11a6c12 Author: Krutika Dhananjay <kdhananj> Date: Tue Mar 28 19:26:41 2017 +0530 features/shard: Pass the correct iatt for cache invalidation This fixes a performance issue with shard which was causing the translator to trigger unusually high number of lookups for cache invalidation even when there was no modification to the file. In shard_common_stat_cbk(), it is local->prebuf that contains the aggregated size and block count as opposed to buf which only holds the attributes for the physical copy of base shard. Passing buf for inode_ctx invalidation would always set refresh to true since the file size in inode ctx contains the aggregated size and would never be same as @buf->ia_size. This was leading to every write/read being preceded by a lookup on the base shard even when the file underwent no modification. Change-Id: Ib0349291d2d01f3782d6d0bdd90c6db5e0609210 BUG: 1436739 Signed-off-by: Krutika Dhananjay <kdhananj> Reviewed-on: https://review.gluster.org/16961 NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Smoke: Gluster Build System <jenkins.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu> --- Additional comment from Krutika Dhananjay on 2017-04-04 05:54:06 EDT --- https://code.engineering.redhat.com/gerrit/#/c/102390/
upstream patch : https://review.gluster.org/16961
Confirmed with the Dev ( Krutika ) that the fix is not intrusive and doesn't bring in much change. The basic functionality test should do good.
https://code.engineering.redhat.com/gerrit/#/c/102819/
Hi Krutika, I tested this with the patch what you have provided and i see the following results. With out this fix following are the numbers i see: ==================================================== Total lookups sent over the fuse mount are : 1057 Total lookups sent over the brick : 16273 There were ~1100 lookups from mount while the brick got > 16200 lookups. With the fix following are the numbers i see: ============================================== Total lookups sent over the fuse mount : 1280 Total lookups sent over the brick : 3037 There were ~1300 lookups from the mount while the brick got > 3000 lookups. Is this acceptable number?
(In reply to RamaKasturi from comment #5) > Hi Krutika, > > I tested this with the patch what you have provided and i see the > following results. > > With out this fix following are the numbers i see: > ==================================================== > Total lookups sent over the fuse mount are : 1057 > > Total lookups sent over the brick : 16273 > > There were ~1100 lookups from mount while the brick got > 16200 lookups. > > With the fix following are the numbers i see: > ============================================== > > Total lookups sent over the fuse mount : 1280 > > Total lookups sent over the brick : 3037 How did you calculate the number of lookups received by the bricks? -Krutika > > There were ~1300 lookups from the mount while the brick got > 3000 lookups. > Is this acceptable number?
(In reply to Krutika Dhananjay from comment #6) > (In reply to RamaKasturi from comment #5) > > Hi Krutika, > > > > I tested this with the patch what you have provided and i see the > > following results. > > > > With out this fix following are the numbers i see: > > ==================================================== > > Total lookups sent over the fuse mount are : 1057 > > > > Total lookups sent over the brick : 16273 > > > > There were ~1100 lookups from mount while the brick got > 16200 lookups. > > > > With the fix following are the numbers i see: > > ============================================== > > > > Total lookups sent over the fuse mount : 1280 > > > > Total lookups sent over the brick : 3037 > > How did you calculate the number of lookups received by the bricks? i checked the profile file for the data volume and under one of the brick i looked at "Interval 0 stats" and at the lookup column fop. Here is the output from the profile of data volume: ======================================================= http://pastebin.test.redhat.com/475027 > > -Krutika > > > > > There were ~1300 lookups from the mount while the brick got > 3000 lookups. > > Is this acceptable number?
Verified and works fine with build glusterfs-3.8.4-18.1.el7rhgs.x86_64. Tested this with 3 vms, each of them hosted on a single hypervisor and below is the process which i followed for testing this. 1) Install HC with three hypervisors. 2) Create 1 vm on each of them. 3) Install fio on the vms. 4) Ran 80:20 workload on the vms initially, profiled the fuse mount and did not save the values as writes will also be involved in this. 5) Ran rand read 100 percent workload and profiled the fuse mount from all the three hyper visors. 1st iteration: ======================== Total look ups sent over the fuse mount : 785 Total looks up sent over brick are : 1131 2nd iteration: ====================== Total look up sent over the fuse mount : 810 Total looks up sent over the brick are : 1111 3rd iteration: ====================== Total look up sent over the fuse mount : 798 Total look up sent over the brick are : 1105 profile output from fuse and brick are present in the link below http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/HC/output_lookups/
Krutika, Could you please review and sign-off the edited doc text?
Looks good to me.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1418