1375431 – [RFE] enable sharding and strict-o-direct with virt profile - /var/lib/glusterd/groups/virt

Bug 1375431 - [RFE] enable sharding and strict-o-direct with virt profile - /var/lib/glusterd/groups/virt

Summary: [RFE] enable sharding and strict-o-direct with virt profile - /var/lib/gluste...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	mainline
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	Krutika Dhananjay
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1375849 1376464 1402215 1402216
TreeView+	depends on / blocked

Reported:	2016-09-13 06:15 UTC by SATHEESARAN
Modified:	2019-08-28 06:54 UTC (History)
CC List:	6 users (show)
Fixed In Version:	glusterfs-3.10.0
Clone Of:
Clones:	1375849 1376464 1402215 1402216 (view as bug list)
Environment:
Last Closed:	2017-03-06 17:26:11 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description SATHEESARAN 2016-09-13 06:15:22 UTC

Description of problem:
-----------------------
Sharding seems to be most vital for the virt store usecase and this needs to be turned when the volume is optimized for virt-store

The following options needs to be added to the virt profile - /var/lib/glusterd/groups/virt

features.shard=on
cluster.data-self-heal-algorithm=full

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
mainline

How reproducible:
-----------------
Not applicable as this is a RFE

Steps to Reproduce:
-------------------
Not applicable as this is a RFE

Actual results:
---------------
sharding is not enabled by default by optimizing the volume for virt store

Expected results:
-----------------
Sharding should be enabled by default on optimizing the gluster volume for virt store usecase

Comment 1 SATHEESARAN 2016-09-14 13:08:42 UTC

strict-o-direct also needs to be turned on and remote-dio to be turned off

In total there are 4 options :

features.shard=on
cluster.data-self-heal-algorithm=full
performance.strict-o-direct=on
network.remote-dio=disable

Comment 2 Worker Ant 2016-12-01 14:34:21 UTC

REVIEW: http://review.gluster.org/15995 (extras: Include shard and full-data-heal in virt group) posted (#1) for review on master by Krutika Dhananjay (kdhananj)

Comment 3 Krutika Dhananjay 2016-12-01 16:54:30 UTC

(In reply to SATHEESARAN from comment #1)
> strict-o-direct also needs to be turned on and remote-dio to be turned off
> 
> In total there are 4 options :
> 
> features.shard=on
> cluster.data-self-heal-algorithm=full
> performance.strict-o-direct=on
> network.remote-dio=disable

Just for the record, the option features.shard=on and cluster.data-self-heal-algorithm=full will be added to group virt. O-DIRECT options will be skipped since not all users might want to use cache=none qemu option, and so it is best to configure them separately.
-Krutika

Comment 4 Pranith Kumar K 2016-12-02 05:02:03 UTC

(In reply to Krutika Dhananjay from comment #3)
> (In reply to SATHEESARAN from comment #1)
> > strict-o-direct also needs to be turned on and remote-dio to be turned off
> > 
> > In total there are 4 options :
> > 
> > features.shard=on
> > cluster.data-self-heal-algorithm=full
> > performance.strict-o-direct=on
> > network.remote-dio=disable
> 
> Just for the record, the option features.shard=on and
> cluster.data-self-heal-algorithm=full will be added to group virt. O-DIRECT
> options will be skipped since not all users might want to use cache=none
> qemu option, and so it is best to configure them separately.
> -Krutika

odirect options honour the o-direct flag for open. Does qemu always open with o-direct even when cache is not set as 'none'?

Comment 5 Krutika Dhananjay 2016-12-02 05:13:00 UTC

(In reply to Pranith Kumar K from comment #4)
> (In reply to Krutika Dhananjay from comment #3)
> > (In reply to SATHEESARAN from comment #1)
> > > strict-o-direct also needs to be turned on and remote-dio to be turned off
> > > 
> > > In total there are 4 options :
> > > 
> > > features.shard=on
> > > cluster.data-self-heal-algorithm=full
> > > performance.strict-o-direct=on
> > > network.remote-dio=disable
> > 
> > Just for the record, the option features.shard=on and
> > cluster.data-self-heal-algorithm=full will be added to group virt. O-DIRECT
> > options will be skipped since not all users might want to use cache=none
> > qemu option, and so it is best to configure them separately.
> > -Krutika
> 
> odirect options honour the o-direct flag for open. Does qemu always open
> with o-direct even when cache is not set as 'none'?

When cache=none is not used, I believe qemu won't be passing O_DIRECT flag.
Now that I remember, there was one more reason Vijay mentioned about a certain ping-timeout issue, which if fixed, we won't need to rely on any of the odirect option (even if cache=none is used by qemu).

-Krutika

Comment 6 Pranith Kumar K 2016-12-02 05:23:30 UTC

(In reply to Krutika Dhananjay from comment #5)
> (In reply to Pranith Kumar K from comment #4)
> > (In reply to Krutika Dhananjay from comment #3)
> > > (In reply to SATHEESARAN from comment #1)
> > > > strict-o-direct also needs to be turned on and remote-dio to be turned off
> > > > 
> > > > In total there are 4 options :
> > > > 
> > > > features.shard=on
> > > > cluster.data-self-heal-algorithm=full
> > > > performance.strict-o-direct=on
> > > > network.remote-dio=disable
> > > 
> > > Just for the record, the option features.shard=on and
> > > cluster.data-self-heal-algorithm=full will be added to group virt. O-DIRECT
> > > options will be skipped since not all users might want to use cache=none
> > > qemu option, and so it is best to configure them separately.
> > > -Krutika
> > 
> > odirect options honour the o-direct flag for open. Does qemu always open
> > with o-direct even when cache is not set as 'none'?
> 
> When cache=none is not used, I believe qemu won't be passing O_DIRECT flag.
> Now that I remember, there was one more reason Vijay mentioned about a
> certain ping-timeout issue, which if fixed, we won't need to rely on any of
> the odirect option (even if cache=none is used by qemu).
> 
> -Krutika

Okay, so what is the plan for the deployments? Are we going to suggest users to apply virt-profile and explicitly set remote-dio to off in every deployment, considering that cache=none is used by default?

Comment 7 Krutika Dhananjay 2016-12-02 05:51:05 UTC

(In reply to Pranith Kumar K from comment #6)
> (In reply to Krutika Dhananjay from comment #5)
> > (In reply to Pranith Kumar K from comment #4)
> > > (In reply to Krutika Dhananjay from comment #3)
> > > > (In reply to SATHEESARAN from comment #1)
> > > > > strict-o-direct also needs to be turned on and remote-dio to be turned off
> > > > > 
> > > > > In total there are 4 options :
> > > > > 
> > > > > features.shard=on
> > > > > cluster.data-self-heal-algorithm=full
> > > > > performance.strict-o-direct=on
> > > > > network.remote-dio=disable
> > > > 
> > > > Just for the record, the option features.shard=on and
> > > > cluster.data-self-heal-algorithm=full will be added to group virt. O-DIRECT
> > > > options will be skipped since not all users might want to use cache=none
> > > > qemu option, and so it is best to configure them separately.
> > > > -Krutika
> > > 
> > > odirect options honour the o-direct flag for open. Does qemu always open
> > > with o-direct even when cache is not set as 'none'?
> > 
> > When cache=none is not used, I believe qemu won't be passing O_DIRECT flag.
> > Now that I remember, there was one more reason Vijay mentioned about a
> > certain ping-timeout issue, which if fixed, we won't need to rely on any of
> > the odirect option (even if cache=none is used by qemu).
> > 
> > -Krutika
> 
> Okay, so what is the plan for the deployments? Are we going to suggest users
> to apply virt-profile and explicitly set remote-dio to off in every
> deployment, considering that cache=none is used by default?

cache=none is used by default? Isn't that a very specific case and confined to ovirt users alone?

Comment 8 Pranith Kumar K 2016-12-02 06:07:06 UTC

(In reply to Krutika Dhananjay from comment #7)
> (In reply to Pranith Kumar K from comment #6)
> > (In reply to Krutika Dhananjay from comment #5)
> > > (In reply to Pranith Kumar K from comment #4)
> > > > (In reply to Krutika Dhananjay from comment #3)
> > > > > (In reply to SATHEESARAN from comment #1)
> > > > > > strict-o-direct also needs to be turned on and remote-dio to be turned off
> > > > > > 
> > > > > > In total there are 4 options :
> > > > > > 
> > > > > > features.shard=on
> > > > > > cluster.data-self-heal-algorithm=full
> > > > > > performance.strict-o-direct=on
> > > > > > network.remote-dio=disable
> > > > > 
> > > > > Just for the record, the option features.shard=on and
> > > > > cluster.data-self-heal-algorithm=full will be added to group virt. O-DIRECT
> > > > > options will be skipped since not all users might want to use cache=none
> > > > > qemu option, and so it is best to configure them separately.
> > > > > -Krutika
> > > > 
> > > > odirect options honour the o-direct flag for open. Does qemu always open
> > > > with o-direct even when cache is not set as 'none'?
> > > 
> > > When cache=none is not used, I believe qemu won't be passing O_DIRECT flag.
> > > Now that I remember, there was one more reason Vijay mentioned about a
> > > certain ping-timeout issue, which if fixed, we won't need to rely on any of
> > > the odirect option (even if cache=none is used by qemu).
> > > 
> > > -Krutika
> > 
> > Okay, so what is the plan for the deployments? Are we going to suggest users
> > to apply virt-profile and explicitly set remote-dio to off in every
> > deployment, considering that cache=none is used by default?
> 
> cache=none is used by default? Isn't that a very specific case and confined
> to ovirt users alone?

It seems like quite a few of them are recommending cache as none for different reasons, including proxmox which is a bit popular in gluster-users:
https://pve.proxmox.com/wiki/Performance_Tweaks#Disk_Cache
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html-single/Virtualization_Tuning_and_Optimization_Guide/index.html#sect-Virtualization_Tuning_Optimization_Guide-BlockIO-Caching

So I am thinking it is better to put it in the profile than to suggest users to change this for all deployments. It seems to be safer option as well because it eliminates human errors where users may forget to turn this option off which may lead to VM pauses.

Comment 9 Krutika Dhananjay 2016-12-02 06:44:43 UTC

(In reply to Pranith Kumar K from comment #8)
> (In reply to Krutika Dhananjay from comment #7)
> > (In reply to Pranith Kumar K from comment #6)
> > > (In reply to Krutika Dhananjay from comment #5)
> > > > (In reply to Pranith Kumar K from comment #4)
> > > > > (In reply to Krutika Dhananjay from comment #3)
> > > > > > (In reply to SATHEESARAN from comment #1)
> > > > > > > strict-o-direct also needs to be turned on and remote-dio to be turned off
> > > > > > > 
> > > > > > > In total there are 4 options :
> > > > > > > 
> > > > > > > features.shard=on
> > > > > > > cluster.data-self-heal-algorithm=full
> > > > > > > performance.strict-o-direct=on
> > > > > > > network.remote-dio=disable
> > > > > > 
> > > > > > Just for the record, the option features.shard=on and
> > > > > > cluster.data-self-heal-algorithm=full will be added to group virt. O-DIRECT
> > > > > > options will be skipped since not all users might want to use cache=none
> > > > > > qemu option, and so it is best to configure them separately.
> > > > > > -Krutika
> > > > > 
> > > > > odirect options honour the o-direct flag for open. Does qemu always open
> > > > > with o-direct even when cache is not set as 'none'?
> > > > 
> > > > When cache=none is not used, I believe qemu won't be passing O_DIRECT flag.
> > > > Now that I remember, there was one more reason Vijay mentioned about a
> > > > certain ping-timeout issue, which if fixed, we won't need to rely on any of
> > > > the odirect option (even if cache=none is used by qemu).
> > > > 
> > > > -Krutika
> > > 
> > > Okay, so what is the plan for the deployments? Are we going to suggest users
> > > to apply virt-profile and explicitly set remote-dio to off in every
> > > deployment, considering that cache=none is used by default?
> > 
> > cache=none is used by default? Isn't that a very specific case and confined
> > to ovirt users alone?
> 
> It seems like quite a few of them are recommending cache as none for
> different reasons, including proxmox which is a bit popular in gluster-users:
> https://pve.proxmox.com/wiki/Performance_Tweaks#Disk_Cache
> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/
> html-single/Virtualization_Tuning_and_Optimization_Guide/index.html#sect-
> Virtualization_Tuning_Optimization_Guide-BlockIO-Caching
> 
> So I am thinking it is better to put it in the profile than to suggest users
> to change this for all deployments. It seems to be safer option as well
> because it eliminates human errors where users may forget to turn this
> option off which may lead to VM pauses.

I'm not entirely convinced. What if not all users have the kind of heavy workload that was used in testing which led to VM pause and required o-direct options to be enabled? Why should all users suffer performance penalty associated with having odirect options set?

Comment 10 Pranith Kumar K 2016-12-02 06:50:39 UTC

(In reply to Krutika Dhananjay from comment #9)
> (In reply to Pranith Kumar K from comment #8)
> > (In reply to Krutika Dhananjay from comment #7)
> > > (In reply to Pranith Kumar K from comment #6)
> > > > (In reply to Krutika Dhananjay from comment #5)
> > > > > (In reply to Pranith Kumar K from comment #4)
> > > > > > (In reply to Krutika Dhananjay from comment #3)
> > > > > > > (In reply to SATHEESARAN from comment #1)
> > > > > > > > strict-o-direct also needs to be turned on and remote-dio to be turned off
> > > > > > > > 
> > > > > > > > In total there are 4 options :
> > > > > > > > 
> > > > > > > > features.shard=on
> > > > > > > > cluster.data-self-heal-algorithm=full
> > > > > > > > performance.strict-o-direct=on
> > > > > > > > network.remote-dio=disable
> > > > > > > 
> > > > > > > Just for the record, the option features.shard=on and
> > > > > > > cluster.data-self-heal-algorithm=full will be added to group virt. O-DIRECT
> > > > > > > options will be skipped since not all users might want to use cache=none
> > > > > > > qemu option, and so it is best to configure them separately.
> > > > > > > -Krutika
> > > > > > 
> > > > > > odirect options honour the o-direct flag for open. Does qemu always open
> > > > > > with o-direct even when cache is not set as 'none'?
> > > > > 
> > > > > When cache=none is not used, I believe qemu won't be passing O_DIRECT flag.
> > > > > Now that I remember, there was one more reason Vijay mentioned about a
> > > > > certain ping-timeout issue, which if fixed, we won't need to rely on any of
> > > > > the odirect option (even if cache=none is used by qemu).
> > > > > 
> > > > > -Krutika
> > > > 
> > > > Okay, so what is the plan for the deployments? Are we going to suggest users
> > > > to apply virt-profile and explicitly set remote-dio to off in every
> > > > deployment, considering that cache=none is used by default?
> > > 
> > > cache=none is used by default? Isn't that a very specific case and confined
> > > to ovirt users alone?
> > 
> > It seems like quite a few of them are recommending cache as none for
> > different reasons, including proxmox which is a bit popular in gluster-users:
> > https://pve.proxmox.com/wiki/Performance_Tweaks#Disk_Cache
> > https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/
> > html-single/Virtualization_Tuning_and_Optimization_Guide/index.html#sect-
> > Virtualization_Tuning_Optimization_Guide-BlockIO-Caching
> > 
> > So I am thinking it is better to put it in the profile than to suggest users
> > to change this for all deployments. It seems to be safer option as well
> > because it eliminates human errors where users may forget to turn this
> > option off which may lead to VM pauses.
> 
> I'm not entirely convinced. What if not all users have the kind of heavy
> workload that was used in testing which led to VM pause and required
> o-direct options to be enabled? Why should all users suffer performance
> penalty associated with having odirect options set?

Based on the data we have so far disabling remote-dio and enabling strict-o-direct is safer. Did I get that right? For people who want better performance based on their workload, they can choose to enable o-direct, but they do know at the time of enabling that this is the choice they made. But default option should be the safest one.
If we choose remote-dio=enable as the default, people who are not informed enough will learn about the problem only after the VM pauses, we do not want that.

Comment 11 Worker Ant 2016-12-02 07:44:46 UTC

REVIEW: http://review.gluster.org/16005 (extras: Add odirect options, shard and full data heal to group virt) posted (#1) for review on master by Krutika Dhananjay (kdhananj)

Comment 12 Pranith Kumar K 2016-12-02 07:53:48 UTC

Thanks Krutika for submitting this version of the patch as well.

Vijay,
     Based on our discussion I am of the opinion that enabling odirect options in the profile is better. Could you let us know if you see any issues with this?

Pranith

Comment 13 Worker Ant 2016-12-05 04:44:16 UTC

COMMIT: http://review.gluster.org/15995 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit 45f914ec9c7b15ba8e962b8fae3593f06912c1f0
Author: Krutika Dhananjay <kdhananj>
Date:   Thu Dec 1 17:28:40 2016 +0530

    extras: Include shard and full-data-heal in virt group
    
    Change-Id: Iea66cb017bd1ab62da9cd65895fa65fc6896108b
    BUG: 1375431
    Signed-off-by: Krutika Dhananjay <kdhananj>
    Reviewed-on: http://review.gluster.org/15995
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Reviewed-by: Atin Mukherjee <amukherj>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 14 Shyamsundar 2017-03-06 17:26:11 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report.

glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 15 Vijay Bellur 2019-01-10 18:12:29 UTC

Clearing needinfo as this bug is closed now.

Comment 16 Nir Soffer 2019-08-16 23:32:34 UTC

Krutika, according to comment 10, remote-dio and enabling strict-o-direct
should be part of the virt group, but this bug was closed without adding
them.

So it looks like this bug was closed without implementing the requested
feature.

We seems to have issues like this:
https://bugzilla.redhat.com/show_bug.cgi?id=1737256#c10

Because strict-o-direct is not part of the virt group.

Should we file a new RFE for including it in the virt group?

Comment 17 Krutika Dhananjay 2019-08-28 06:54:59 UTC

(In reply to Nir Soffer from comment #16)
> Krutika, according to comment 10, remote-dio and enabling strict-o-direct
> should be part of the virt group, but this bug was closed without adding
> them.
> 
> So it looks like this bug was closed without implementing the requested
> feature.
> 
> We seems to have issues like this:
> https://bugzilla.redhat.com/show_bug.cgi?id=1737256#c10
> 
> Because strict-o-direct is not part of the virt group.
> 
> Should we file a new RFE for including it in the virt group?

Sorry about the late response. I was focusing all my attention on some cu cases past few days.

Yeah, I think it's a valid point, given the amount of confusion around it.
Could you file the bz and share the bug-id with me? I'll send a patch after that.

-Krutika

Note You need to log in before you can comment on or make changes to this bug.