Bug 1630368

Summary: Low Random write IOPS in VM workloads
Product: [Community] GlusterFS Reporter: Pranith Kumar K <pkarampu>
Component: replicateAssignee: bugs <bugs>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: high    
Version: mainlineCC: ahadas, bugs, famz, godas, guillaume.pavese, kdhananj, ksubrahm, kwolf, michal.skrivanek, mpillai, nichawla, pasik, pkarampu, psuriset, ravishankar, rhs-bugs, sabose, sankarshan, sasundar, shberry, vbellur, ykaul
Target Milestone: ---Keywords: Performance, Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-6.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1616270
: 1635972 (view as bug list) Environment:
Last Closed: 2019-03-25 16:30:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1630688, 1635972, 1635976, 1635980    

Comment 1 Pranith Kumar K 2018-09-18 13:20:53 UTC
Description of problem:
To get the baseline Random write IOPS of SSD gluster volume, that is added as a storage domain in oVirt so as to provide block storage to VM provisioned, is mounted on of the hosts and a fio random write run is executed over it. Later, the same volume was added as a storage domain and added as a block device to the Virtual Machine(VM) running on top of oVirt and the same fio random write test is executed within the VM. Following are the results:

1. Baseline Random Write IOPS: 6456
2. Virtual Machine Random Write IOPS: 602

Comment 2 Worker Ant 2018-09-18 13:24:23 UTC
REVIEW: https://review.gluster.org/21210 (cluster/afr: Make data eager-lock decision based on conflicting locks) posted (#1) for review on master by Pranith Kumar Karampuri

Comment 3 Worker Ant 2018-09-19 06:59:58 UTC
REVIEW: https://review.gluster.org/21214 (cluster/afr: Add eager-lock stats for better debugging experience) posted (#1) for review on master by Krutika Dhananjay

Comment 4 Worker Ant 2018-09-21 04:43:24 UTC
COMMIT: https://review.gluster.org/21210 committed in master by "Pranith Kumar Karampuri" <pkarampu> with a commit message- cluster/afr: Make data eager-lock decision based on number of locks

For both Virt and block workloads the file is opened multiple times
leading to dynamically setting eager-lock to off for the workload.
Instead of depending on the number-of-open-fds, if we change the
logic to depend on number of inodelks, then it will give better
performance than the earlier logic. When there is an eager-lock
and number of inodelks is more than 1 we know that there is a
conflicting lock, so depend on that information to decide whether
to keep the current transaction go through delayed-post-op or not.

Locks xlator doesn't have implementation to query number of locks in
fxattrop in releases older than 3.10 so to keep things backward
compatible in 3.12, data transactions will use new logic where as
fxattrop transactions will use old logic. I am planning to send one
more patch which makes metadata domain locks also depend on
inodelk-count

Profile info for a dd of 500MB to a file with another fd opened
on the file using exec 250>filename

Without this patch:
 0.14      67.41 us      16.72 us    3870.82 us  892 FINODELK
 0.59     279.87 us      95.71 us    2085.89 us  898 FXATTROP
 3.46     366.43 us      81.75 us    6952.79 us 4000 WRITE
95.79  148733.99 us   50568.12 us  919127.86 us  273 FSYNC

With this patch:
 0.00      51.01 us      38.07 us      80.16 us    4 FINODELK
 0.00     235.43 us     235.43 us     235.43 us    1 TRUNCATE
 0.00     125.07 us      56.80 us     193.33 us    2 GETXATTR
 0.00     135.86 us      62.13 us     209.59 us    2  INODELK
 0.00     197.88 us     155.39 us     253.90 us    4 FXATTROP
 0.00     450.59 us     394.28 us     506.89 us    2  XATTROP
 0.00      56.96 us      19.06 us     406.59 us   23    FLUSH
37.81  273648.93 us      48.43 us 6017657.05 us   44   LOOKUP
62.18    4951.86 us      93.80 us 1143154.75 us 3999    WRITE

postgresql benchmark performance changed from ~1130 TPS to ~2300TPS
randio fio job inside Ovirt based VM went from ~600IOPs to ~2000IOPS

fixes bz#1630368
Change-Id: If7f7388d2f08cf7f17ca517a4ea222560661dc36
Signed-off-by: Pranith Kumar K <pkarampu>

Comment 5 Yaniv Kaul 2018-11-04 09:03:52 UTC
This was moved to MODIFIED quite some time ago. When will it be available?

Comment 6 Pranith Kumar K 2018-11-05 11:12:36 UTC
(In reply to Yaniv Kaul from comment #5)
> This was moved to MODIFIED quite some time ago. When will it be available?

It is available in release-5 through the bz: https://bugzilla.redhat.com/show_bug.cgi?id=1635972

and It will be available in next dot-release in 4.1 through https://bugzilla.redhat.com/show_bug.cgi?id=1635980 (Patches merged, awaiting release)

This bz only tracks the change on master, which will be closed after release-6 I believe.

Comment 7 Yaniv Kaul 2018-11-05 11:31:05 UTC
(In reply to Pranith Kumar K from comment #6)
> (In reply to Yaniv Kaul from comment #5)
> > This was moved to MODIFIED quite some time ago. When will it be available?
> 
> It is available in release-5 through the bz:
> https://bugzilla.redhat.com/show_bug.cgi?id=1635972
> 
> and It will be available in next dot-release in 4.1 through
> https://bugzilla.redhat.com/show_bug.cgi?id=1635980 (Patches merged,
> awaiting release)
> 
> This bz only tracks the change on master, which will be closed after
> release-6 I believe.

That makes little sense to me, process-wise. If it's in master already, great. It needs to be there before being backported. Once it's there, if QE is not testing it, just CLOSE-NEXTRELEASE it.

Comment 8 Pranith Kumar K 2018-11-05 11:57:03 UTC
(In reply to Yaniv Kaul from comment #7)
> (In reply to Pranith Kumar K from comment #6)
> > (In reply to Yaniv Kaul from comment #5)
> > > This was moved to MODIFIED quite some time ago. When will it be available?
> > 
> > It is available in release-5 through the bz:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1635972
> > 
> > and It will be available in next dot-release in 4.1 through
> > https://bugzilla.redhat.com/show_bug.cgi?id=1635980 (Patches merged,
> > awaiting release)
> > 
> > This bz only tracks the change on master, which will be closed after
> > release-6 I believe.
> 
> That makes little sense to me, process-wise. If it's in master already,
> great. It needs to be there before being backported. Once it's there, if QE
> is not testing it, just CLOSE-NEXTRELEASE it.

Let me send a mail on maintainers mailing list. It would be better to make this change in automation/process. Only creating the bz/assigning the bz is manual, rest of the state-changes are done by process.

Comment 9 Karthik U S 2018-11-20 09:12:46 UTC
As part of triaging we are closing this bug as this is fixed in the current master and will be available with the next release.
@Pranith did you get a chance to send the mail which you were talking about in comment #8?

Comment 10 Pranith Kumar K 2018-11-23 10:26:53 UTC
(In reply to Karthik U S from comment #9)
> As part of triaging we are closing this bug as this is fixed in the current
> master and will be available with the next release.
> @Pranith did you get a chance to send the mail which you were talking about
> in comment #8?

Yes, I sent the mail.

Comment 11 Worker Ant 2018-12-19 12:38:35 UTC
REVISION POSTED: https://review.gluster.org/21214 (cluster/afr: Add eager-lock stats for better debugging experience) posted (#3) for review on master by Krutika Dhananjay

Comment 12 Worker Ant 2018-12-19 12:39:51 UTC
REVIEW: https://review.gluster.org/21214 (cluster/afr: Add eager-lock stats for better debugging experience) posted (#4) for review on master by Krutika Dhananjay

Comment 13 Shyamsundar 2019-03-25 16:30:43 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-6.0, please open a new bug report.

glusterfs-6.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2019-March/000120.html
[2] https://www.gluster.org/pipermail/gluster-users/