Description of problem: ----------------------- Sharding seems to be most vital for the virt store usecase and this needs to be turned when the volume is optimized for virt-store The following options needs to be added to the virt profile - /var/lib/glusterd/groups/virt features.shard=on cluster.data-self-heal-algorithm=full Version-Release number of selected component (if applicable): ------------------------------------------------------------- mainline How reproducible: ----------------- Not applicable as this is a RFE Steps to Reproduce: ------------------- Not applicable as this is a RFE Actual results: --------------- sharding is not enabled by default by optimizing the volume for virt store Expected results: ----------------- Sharding should be enabled by default on optimizing the gluster volume for virt store usecase
strict-o-direct also needs to be turned on and remote-dio to be turned off In total there are 4 options : features.shard=on cluster.data-self-heal-algorithm=full performance.strict-o-direct=on network.remote-dio=disable
REVIEW: http://review.gluster.org/15995 (extras: Include shard and full-data-heal in virt group) posted (#1) for review on master by Krutika Dhananjay (kdhananj)
(In reply to SATHEESARAN from comment #1) > strict-o-direct also needs to be turned on and remote-dio to be turned off > > In total there are 4 options : > > features.shard=on > cluster.data-self-heal-algorithm=full > performance.strict-o-direct=on > network.remote-dio=disable Just for the record, the option features.shard=on and cluster.data-self-heal-algorithm=full will be added to group virt. O-DIRECT options will be skipped since not all users might want to use cache=none qemu option, and so it is best to configure them separately. -Krutika
(In reply to Krutika Dhananjay from comment #3) > (In reply to SATHEESARAN from comment #1) > > strict-o-direct also needs to be turned on and remote-dio to be turned off > > > > In total there are 4 options : > > > > features.shard=on > > cluster.data-self-heal-algorithm=full > > performance.strict-o-direct=on > > network.remote-dio=disable > > Just for the record, the option features.shard=on and > cluster.data-self-heal-algorithm=full will be added to group virt. O-DIRECT > options will be skipped since not all users might want to use cache=none > qemu option, and so it is best to configure them separately. > -Krutika odirect options honour the o-direct flag for open. Does qemu always open with o-direct even when cache is not set as 'none'?
(In reply to Pranith Kumar K from comment #4) > (In reply to Krutika Dhananjay from comment #3) > > (In reply to SATHEESARAN from comment #1) > > > strict-o-direct also needs to be turned on and remote-dio to be turned off > > > > > > In total there are 4 options : > > > > > > features.shard=on > > > cluster.data-self-heal-algorithm=full > > > performance.strict-o-direct=on > > > network.remote-dio=disable > > > > Just for the record, the option features.shard=on and > > cluster.data-self-heal-algorithm=full will be added to group virt. O-DIRECT > > options will be skipped since not all users might want to use cache=none > > qemu option, and so it is best to configure them separately. > > -Krutika > > odirect options honour the o-direct flag for open. Does qemu always open > with o-direct even when cache is not set as 'none'? When cache=none is not used, I believe qemu won't be passing O_DIRECT flag. Now that I remember, there was one more reason Vijay mentioned about a certain ping-timeout issue, which if fixed, we won't need to rely on any of the odirect option (even if cache=none is used by qemu). -Krutika
(In reply to Krutika Dhananjay from comment #5) > (In reply to Pranith Kumar K from comment #4) > > (In reply to Krutika Dhananjay from comment #3) > > > (In reply to SATHEESARAN from comment #1) > > > > strict-o-direct also needs to be turned on and remote-dio to be turned off > > > > > > > > In total there are 4 options : > > > > > > > > features.shard=on > > > > cluster.data-self-heal-algorithm=full > > > > performance.strict-o-direct=on > > > > network.remote-dio=disable > > > > > > Just for the record, the option features.shard=on and > > > cluster.data-self-heal-algorithm=full will be added to group virt. O-DIRECT > > > options will be skipped since not all users might want to use cache=none > > > qemu option, and so it is best to configure them separately. > > > -Krutika > > > > odirect options honour the o-direct flag for open. Does qemu always open > > with o-direct even when cache is not set as 'none'? > > When cache=none is not used, I believe qemu won't be passing O_DIRECT flag. > Now that I remember, there was one more reason Vijay mentioned about a > certain ping-timeout issue, which if fixed, we won't need to rely on any of > the odirect option (even if cache=none is used by qemu). > > -Krutika Okay, so what is the plan for the deployments? Are we going to suggest users to apply virt-profile and explicitly set remote-dio to off in every deployment, considering that cache=none is used by default?
(In reply to Pranith Kumar K from comment #6) > (In reply to Krutika Dhananjay from comment #5) > > (In reply to Pranith Kumar K from comment #4) > > > (In reply to Krutika Dhananjay from comment #3) > > > > (In reply to SATHEESARAN from comment #1) > > > > > strict-o-direct also needs to be turned on and remote-dio to be turned off > > > > > > > > > > In total there are 4 options : > > > > > > > > > > features.shard=on > > > > > cluster.data-self-heal-algorithm=full > > > > > performance.strict-o-direct=on > > > > > network.remote-dio=disable > > > > > > > > Just for the record, the option features.shard=on and > > > > cluster.data-self-heal-algorithm=full will be added to group virt. O-DIRECT > > > > options will be skipped since not all users might want to use cache=none > > > > qemu option, and so it is best to configure them separately. > > > > -Krutika > > > > > > odirect options honour the o-direct flag for open. Does qemu always open > > > with o-direct even when cache is not set as 'none'? > > > > When cache=none is not used, I believe qemu won't be passing O_DIRECT flag. > > Now that I remember, there was one more reason Vijay mentioned about a > > certain ping-timeout issue, which if fixed, we won't need to rely on any of > > the odirect option (even if cache=none is used by qemu). > > > > -Krutika > > Okay, so what is the plan for the deployments? Are we going to suggest users > to apply virt-profile and explicitly set remote-dio to off in every > deployment, considering that cache=none is used by default? cache=none is used by default? Isn't that a very specific case and confined to ovirt users alone?
(In reply to Krutika Dhananjay from comment #7) > (In reply to Pranith Kumar K from comment #6) > > (In reply to Krutika Dhananjay from comment #5) > > > (In reply to Pranith Kumar K from comment #4) > > > > (In reply to Krutika Dhananjay from comment #3) > > > > > (In reply to SATHEESARAN from comment #1) > > > > > > strict-o-direct also needs to be turned on and remote-dio to be turned off > > > > > > > > > > > > In total there are 4 options : > > > > > > > > > > > > features.shard=on > > > > > > cluster.data-self-heal-algorithm=full > > > > > > performance.strict-o-direct=on > > > > > > network.remote-dio=disable > > > > > > > > > > Just for the record, the option features.shard=on and > > > > > cluster.data-self-heal-algorithm=full will be added to group virt. O-DIRECT > > > > > options will be skipped since not all users might want to use cache=none > > > > > qemu option, and so it is best to configure them separately. > > > > > -Krutika > > > > > > > > odirect options honour the o-direct flag for open. Does qemu always open > > > > with o-direct even when cache is not set as 'none'? > > > > > > When cache=none is not used, I believe qemu won't be passing O_DIRECT flag. > > > Now that I remember, there was one more reason Vijay mentioned about a > > > certain ping-timeout issue, which if fixed, we won't need to rely on any of > > > the odirect option (even if cache=none is used by qemu). > > > > > > -Krutika > > > > Okay, so what is the plan for the deployments? Are we going to suggest users > > to apply virt-profile and explicitly set remote-dio to off in every > > deployment, considering that cache=none is used by default? > > cache=none is used by default? Isn't that a very specific case and confined > to ovirt users alone? It seems like quite a few of them are recommending cache as none for different reasons, including proxmox which is a bit popular in gluster-users: https://pve.proxmox.com/wiki/Performance_Tweaks#Disk_Cache https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html-single/Virtualization_Tuning_and_Optimization_Guide/index.html#sect-Virtualization_Tuning_Optimization_Guide-BlockIO-Caching So I am thinking it is better to put it in the profile than to suggest users to change this for all deployments. It seems to be safer option as well because it eliminates human errors where users may forget to turn this option off which may lead to VM pauses.
(In reply to Pranith Kumar K from comment #8) > (In reply to Krutika Dhananjay from comment #7) > > (In reply to Pranith Kumar K from comment #6) > > > (In reply to Krutika Dhananjay from comment #5) > > > > (In reply to Pranith Kumar K from comment #4) > > > > > (In reply to Krutika Dhananjay from comment #3) > > > > > > (In reply to SATHEESARAN from comment #1) > > > > > > > strict-o-direct also needs to be turned on and remote-dio to be turned off > > > > > > > > > > > > > > In total there are 4 options : > > > > > > > > > > > > > > features.shard=on > > > > > > > cluster.data-self-heal-algorithm=full > > > > > > > performance.strict-o-direct=on > > > > > > > network.remote-dio=disable > > > > > > > > > > > > Just for the record, the option features.shard=on and > > > > > > cluster.data-self-heal-algorithm=full will be added to group virt. O-DIRECT > > > > > > options will be skipped since not all users might want to use cache=none > > > > > > qemu option, and so it is best to configure them separately. > > > > > > -Krutika > > > > > > > > > > odirect options honour the o-direct flag for open. Does qemu always open > > > > > with o-direct even when cache is not set as 'none'? > > > > > > > > When cache=none is not used, I believe qemu won't be passing O_DIRECT flag. > > > > Now that I remember, there was one more reason Vijay mentioned about a > > > > certain ping-timeout issue, which if fixed, we won't need to rely on any of > > > > the odirect option (even if cache=none is used by qemu). > > > > > > > > -Krutika > > > > > > Okay, so what is the plan for the deployments? Are we going to suggest users > > > to apply virt-profile and explicitly set remote-dio to off in every > > > deployment, considering that cache=none is used by default? > > > > cache=none is used by default? Isn't that a very specific case and confined > > to ovirt users alone? > > It seems like quite a few of them are recommending cache as none for > different reasons, including proxmox which is a bit popular in gluster-users: > https://pve.proxmox.com/wiki/Performance_Tweaks#Disk_Cache > https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/ > html-single/Virtualization_Tuning_and_Optimization_Guide/index.html#sect- > Virtualization_Tuning_Optimization_Guide-BlockIO-Caching > > So I am thinking it is better to put it in the profile than to suggest users > to change this for all deployments. It seems to be safer option as well > because it eliminates human errors where users may forget to turn this > option off which may lead to VM pauses. I'm not entirely convinced. What if not all users have the kind of heavy workload that was used in testing which led to VM pause and required o-direct options to be enabled? Why should all users suffer performance penalty associated with having odirect options set?
(In reply to Krutika Dhananjay from comment #9) > (In reply to Pranith Kumar K from comment #8) > > (In reply to Krutika Dhananjay from comment #7) > > > (In reply to Pranith Kumar K from comment #6) > > > > (In reply to Krutika Dhananjay from comment #5) > > > > > (In reply to Pranith Kumar K from comment #4) > > > > > > (In reply to Krutika Dhananjay from comment #3) > > > > > > > (In reply to SATHEESARAN from comment #1) > > > > > > > > strict-o-direct also needs to be turned on and remote-dio to be turned off > > > > > > > > > > > > > > > > In total there are 4 options : > > > > > > > > > > > > > > > > features.shard=on > > > > > > > > cluster.data-self-heal-algorithm=full > > > > > > > > performance.strict-o-direct=on > > > > > > > > network.remote-dio=disable > > > > > > > > > > > > > > Just for the record, the option features.shard=on and > > > > > > > cluster.data-self-heal-algorithm=full will be added to group virt. O-DIRECT > > > > > > > options will be skipped since not all users might want to use cache=none > > > > > > > qemu option, and so it is best to configure them separately. > > > > > > > -Krutika > > > > > > > > > > > > odirect options honour the o-direct flag for open. Does qemu always open > > > > > > with o-direct even when cache is not set as 'none'? > > > > > > > > > > When cache=none is not used, I believe qemu won't be passing O_DIRECT flag. > > > > > Now that I remember, there was one more reason Vijay mentioned about a > > > > > certain ping-timeout issue, which if fixed, we won't need to rely on any of > > > > > the odirect option (even if cache=none is used by qemu). > > > > > > > > > > -Krutika > > > > > > > > Okay, so what is the plan for the deployments? Are we going to suggest users > > > > to apply virt-profile and explicitly set remote-dio to off in every > > > > deployment, considering that cache=none is used by default? > > > > > > cache=none is used by default? Isn't that a very specific case and confined > > > to ovirt users alone? > > > > It seems like quite a few of them are recommending cache as none for > > different reasons, including proxmox which is a bit popular in gluster-users: > > https://pve.proxmox.com/wiki/Performance_Tweaks#Disk_Cache > > https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/ > > html-single/Virtualization_Tuning_and_Optimization_Guide/index.html#sect- > > Virtualization_Tuning_Optimization_Guide-BlockIO-Caching > > > > So I am thinking it is better to put it in the profile than to suggest users > > to change this for all deployments. It seems to be safer option as well > > because it eliminates human errors where users may forget to turn this > > option off which may lead to VM pauses. > > I'm not entirely convinced. What if not all users have the kind of heavy > workload that was used in testing which led to VM pause and required > o-direct options to be enabled? Why should all users suffer performance > penalty associated with having odirect options set? Based on the data we have so far disabling remote-dio and enabling strict-o-direct is safer. Did I get that right? For people who want better performance based on their workload, they can choose to enable o-direct, but they do know at the time of enabling that this is the choice they made. But default option should be the safest one. If we choose remote-dio=enable as the default, people who are not informed enough will learn about the problem only after the VM pauses, we do not want that.
REVIEW: http://review.gluster.org/16005 (extras: Add odirect options, shard and full data heal to group virt) posted (#1) for review on master by Krutika Dhananjay (kdhananj)
Thanks Krutika for submitting this version of the patch as well. Vijay, Based on our discussion I am of the opinion that enabling odirect options in the profile is better. Could you let us know if you see any issues with this? Pranith
COMMIT: http://review.gluster.org/15995 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit 45f914ec9c7b15ba8e962b8fae3593f06912c1f0 Author: Krutika Dhananjay <kdhananj> Date: Thu Dec 1 17:28:40 2016 +0530 extras: Include shard and full-data-heal in virt group Change-Id: Iea66cb017bd1ab62da9cd65895fa65fc6896108b BUG: 1375431 Signed-off-by: Krutika Dhananjay <kdhananj> Reviewed-on: http://review.gluster.org/15995 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> Reviewed-by: Atin Mukherjee <amukherj> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Vijay Bellur <vbellur>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report. glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html [2] https://www.gluster.org/pipermail/gluster-users/
Clearing needinfo as this bug is closed now.
Krutika, according to comment 10, remote-dio and enabling strict-o-direct should be part of the virt group, but this bug was closed without adding them. So it looks like this bug was closed without implementing the requested feature. We seems to have issues like this: https://bugzilla.redhat.com/show_bug.cgi?id=1737256#c10 Because strict-o-direct is not part of the virt group. Should we file a new RFE for including it in the virt group?
(In reply to Nir Soffer from comment #16) > Krutika, according to comment 10, remote-dio and enabling strict-o-direct > should be part of the virt group, but this bug was closed without adding > them. > > So it looks like this bug was closed without implementing the requested > feature. > > We seems to have issues like this: > https://bugzilla.redhat.com/show_bug.cgi?id=1737256#c10 > > Because strict-o-direct is not part of the virt group. > > Should we file a new RFE for including it in the virt group? Sorry about the late response. I was focusing all my attention on some cu cases past few days. Yeah, I think it's a valid point, given the amount of confusion around it. Could you file the bz and share the bug-id with me? I'll send a patch after that. -Krutika