1439731 – Sharding: Fix a performance bug

Bug 1439731 - Sharding: Fix a performance bug

Summary: Sharding: Fix a performance bug

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	sharding
Sub Component:
Version:	rhgs-3.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	unspecified
Target Milestone:	---
Target Release:	RHGS 3.2.0 Async
Assignee:	Krutika Dhananjay
QA Contact:	RamaKasturi
Docs Contact:
URL:
Whiteboard:
Depends On:	1436739 1438706
Blocks:
TreeView+	depends on / blocked

Reported:	2017-04-06 12:38 UTC by Rejy M Cyriac
Modified:	2017-06-08 09:34 UTC (History)
CC List:	9 users (show)
Fixed In Version:	glusterfs-3.8.4-18.1
Doc Type:	Bug Fix
Doc Text:	Previously, a bug in shard's in-memory cache invalidation in STAT fop was causing it to send lots of LOOKUPs as part of cache-invalidation, even when the file was not undergoing any modification. As a consequence, every STAT call from the application is followed by a LOOKUP, leading to a sub-optimal performance of I/O operations on the VMs. With this fix, in-memory cache invalidation in shard's STAT operation was corrected. Now, in a pure-read workload, there was not as many LOOKUPs as the number of STATs, leading to improved VM performance. This also benefited performance in mixed read-write workload from the VMs.
Clone Of:	1438706
Environment:
Last Closed:	2017-06-08 09:34:28 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2017:1418	0	normal	SHIPPED_LIVE	glusterfs bug fix update	2017-06-08 13:33:58 UTC

Description Rejy M Cyriac 2017-04-06 12:38:48 UTC

+++ This bug was initially created as a clone of Bug #1438706 +++

+++ This bug was initially created as a clone of Bug #1436739 +++

Description of problem:

As per Sanjay Rao's inputs, there was a performance drop in random reads fio workload when run through vms hosted on sharded volumes.

Volume profile indicated a big difference between the number of lookups sent by FUSE and number of lookups received by individual bricks.

Through code reading, it was found that there is a performance bug in shard which was causing the translator to trigger unusually high number of lookups for cache invalidation even when there was no modification to the file.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Worker Ant on 2017-03-28 10:23:33 EDT ---

REVIEW: https://review.gluster.org/16961 (features/shard: Pass the correct iatt for cache invalidation) posted (#1) for review on master by Krutika Dhananjay (kdhananj)

--- Additional comment from Worker Ant on 2017-03-30 01:48:38 EDT ---

REVIEW: https://review.gluster.org/16961 (features/shard: Pass the correct iatt for cache invalidation) posted (#2) for review on master by Krutika Dhananjay (kdhananj)

--- Additional comment from Worker Ant on 2017-03-30 02:15:59 EDT ---

COMMIT: https://review.gluster.org/16961 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit dd5ada1f11d76b4c55c7c55d23718617f11a6c12
Author: Krutika Dhananjay <kdhananj>
Date:   Tue Mar 28 19:26:41 2017 +0530

    features/shard: Pass the correct iatt for cache invalidation
    
    This fixes a performance issue with shard which was causing
    the translator to trigger unusually high number of lookups
    for cache invalidation even when there was no modification to
    the file.
    
    In shard_common_stat_cbk(), it is local->prebuf that contains the
    aggregated size and block count as opposed to buf which only holds the
    attributes for the physical copy of base shard. Passing buf for
    inode_ctx invalidation would always set refresh to true since the file
    size in inode ctx contains the aggregated size and would never be same
    as @buf->ia_size. This was leading to every write/read being preceded
    by a lookup on the base shard even when the file underwent no
    modification.
    
    Change-Id: Ib0349291d2d01f3782d6d0bdd90c6db5e0609210
    BUG: 1436739
    Signed-off-by: Krutika Dhananjay <kdhananj>
    Reviewed-on: https://review.gluster.org/16961
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>

--- Additional comment from Krutika Dhananjay on 2017-04-04 05:54:06 EDT ---

https://code.engineering.redhat.com/gerrit/#/c/102390/

Comment 1 Atin Mukherjee 2017-04-07 08:44:51 UTC

upstream patch : https://review.gluster.org/16961

Comment 2 SATHEESARAN 2017-04-07 08:48:43 UTC

Confirmed with the Dev ( Krutika ) that the fix is not intrusive and doesn't bring in much change. The basic functionality test should do good.

Comment 3 Krutika Dhananjay 2017-04-07 09:29:22 UTC

https://code.engineering.redhat.com/gerrit/#/c/102819/

Comment 5 RamaKasturi 2017-04-12 07:30:36 UTC

Hi Krutika,

   I tested this with the patch what you have provided and i see the following results.

With out this fix following are the numbers i see:
====================================================
Total lookups sent over the fuse mount are : 1057

Total lookups sent over the brick : 16273

There were ~1100 lookups from mount while the brick got > 16200 lookups.

With the fix following are the numbers i see:
==============================================

Total lookups sent over the fuse mount : 1280

Total lookups sent over the brick : 3037

There were ~1300 lookups from the mount while the brick got > 3000 lookups. Is this acceptable number?

Comment 6 Krutika Dhananjay 2017-04-13 05:24:27 UTC

(In reply to RamaKasturi from comment #5)
> Hi Krutika,
> 
>    I tested this with the patch what you have provided and i see the
> following results.
> 
> With out this fix following are the numbers i see:
> ====================================================
> Total lookups sent over the fuse mount are : 1057
> 
> Total lookups sent over the brick : 16273
> 
> There were ~1100 lookups from mount while the brick got > 16200 lookups.
> 
> With the fix following are the numbers i see:
> ==============================================
> 
> Total lookups sent over the fuse mount : 1280
> 
> Total lookups sent over the brick : 3037

How did you calculate the number of lookups received by the bricks?

-Krutika

> 
> There were ~1300 lookups from the mount while the brick got > 3000 lookups.
> Is this acceptable number?

Comment 7 RamaKasturi 2017-04-13 05:55:00 UTC

(In reply to Krutika Dhananjay from comment #6)
> (In reply to RamaKasturi from comment #5)
> > Hi Krutika,
> > 
> >    I tested this with the patch what you have provided and i see the
> > following results.
> > 
> > With out this fix following are the numbers i see:
> > ====================================================
> > Total lookups sent over the fuse mount are : 1057
> > 
> > Total lookups sent over the brick : 16273
> > 
> > There were ~1100 lookups from mount while the brick got > 16200 lookups.
> > 
> > With the fix following are the numbers i see:
> > ==============================================
> > 
> > Total lookups sent over the fuse mount : 1280
> > 
> > Total lookups sent over the brick : 3037
> 
> How did you calculate the number of lookups received by the bricks?

i checked the profile file for the data volume and under one of the brick i looked at "Interval 0 stats" and at the lookup column fop. 

Here is the output from the profile of data volume:
=======================================================
http://pastebin.test.redhat.com/475027
> 
> -Krutika
> 
> > 
> > There were ~1300 lookups from the mount while the brick got > 3000 lookups.
> > Is this acceptable number?

Comment 9 RamaKasturi 2017-05-11 12:28:27 UTC

Verified and works fine with  build glusterfs-3.8.4-18.1.el7rhgs.x86_64.

Tested this with 3 vms, each of them hosted on a single hypervisor and below is the process which i followed for testing this.

1) Install HC with three hypervisors.
2) Create 1 vm on each of them.
3) Install fio on the vms.
4) Ran 80:20 workload on the vms initially, profiled the fuse mount and did not save the values as writes will also be involved in this.
5) Ran rand read 100 percent workload and profiled the fuse mount from all the three hyper visors.

1st iteration:
========================
Total look ups sent over the fuse mount  : 785
Total looks up sent over brick are : 1131

2nd iteration:
======================
Total look up sent over the fuse mount : 810
Total looks up sent over the brick are : 1111

3rd iteration:
======================
Total look up sent over the fuse mount : 798
Total look up sent over the brick are : 1105

profile output from fuse and brick are present in the link below

http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/HC/output_lookups/

Comment 10 Divya 2017-05-29 09:57:55 UTC

Krutika,

Could you please review and sign-off the edited doc text?

Comment 11 Krutika Dhananjay 2017-05-30 06:16:07 UTC

Looks good to me.

Comment 13 errata-xmlrpc 2017-06-08 09:34:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1418

Note You need to log in before you can comment on or make changes to this bug.