Bug 1486115

Summary: gluster-block profile needs to have strict-o-direct
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Pranith Kumar K <pkarampu>
Component: gluster-blockAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED ERRATA QA Contact: Sweta Anandpara <sanandpa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rhgs-3.3CC: amukherj, bugs, mchangir, mpillai, rcyriac, rhs-bugs, sanandpa, storage-qa-internal
Target Milestone: ---   
Target Release: RHGS 3.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.8.4-43 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1485962 Environment:
Last Closed: 2017-09-21 05:06:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1485962, 1486122    
Bug Blocks: 1417151    

Description Pranith Kumar K 2017-08-29 05:13:48 UTC
+++ This bug was initially created as a clone of Bug #1485962 +++

Description of problem:
    tcmu-runner is not going to open block with O_SYNC anymore
    so writes have a chance of getting cached in write-behind
    when that happens, there is a chance that on failover some
    data could be stuck in cache and be lost. So strict-o-direct should be on


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Worker Ant on 2017-08-28 10:52:31 EDT ---

REVIEW: https://review.gluster.org/18120 (gluster-block: strict-o-direct should be on) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 2 Pranith Kumar K 2017-08-29 05:14:42 UTC
https://review.gluster.org/#/c/18120/

Comment 3 Pranith Kumar K 2017-08-29 06:24:54 UTC
Manoj,
    This will change our perf benchmarks as write-behind was caching before this patch. Wanted to let you know that we need to do perf benchmark again. Leaving a needinfo at the moment, not sure if there is any other way to notify you on the bz.

Pranith

Comment 7 Manoj Pillai 2017-08-29 07:28:19 UTC
(In reply to Pranith Kumar K from comment #3)
> Manoj,
>     This will change our perf benchmarks as write-behind was caching before
> this patch. Wanted to let you know that we need to do perf benchmark again.
> Leaving a needinfo at the moment, not sure if there is any other way to
> notify you on the bz.
> 
> Pranith

Ack. That should be good enough to clear the needinfo? :)

Comment 9 Sweta Anandpara 2017-09-14 08:46:42 UTC
Tested and verified this on the build glusterfs-3.8.4-44.el7rhgs.x86_64, gluster-block-0.2.1-11.el7rhgs.x86_64 and tcmu-runner-1.2.0-14.el7rhgs.x86_64.

'gluster volume set <volname> group gluster-block' does set the new option strict-o-direct to on, along with the other already-present options.

Moving this bug to verified for rhgs 3.3.0. Logs are pasted below.

[root@dhcp47-117 ~]# gluster v create ozone replica 3 10.70.47.121:/bricks/brick8/ozone_0 10.70.47.113:/bricks/brick8/ozone_1 10.70.47.114:/bricks/brick8/ozone_2 10.70.47.115:/bricks/brick8/ozone_3 10.70.47.116:/bricks/brick8/ozone_4 10.70.47.117:/bricks/brick8/ozone_5
volume create: ozone: success: please start the volume to access data
[root@dhcp47-117 ~]# gluster v info ozone
 
Volume Name: ozone
Type: Distributed-Replicate
Volume ID: dacd299a-23f3-4ab9-a5ac-0cfb26e77223
Status: Created
Snapshot Count: 0
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: 10.70.47.121:/bricks/brick8/ozone_0
Brick2: 10.70.47.113:/bricks/brick8/ozone_1
Brick3: 10.70.47.114:/bricks/brick8/ozone_2
Brick4: 10.70.47.115:/bricks/brick8/ozone_3
Brick5: 10.70.47.116:/bricks/brick8/ozone_4
Brick6: 10.70.47.117:/bricks/brick8/ozone_5
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
cluster.brick-multiplex: enable
[root@dhcp47-117 ~]# 
[root@dhcp47-117 ~]# gluster v set ozone group gluster-block
volume set: success
[root@dhcp47-117 ~]# gluster v info ozone
 
Volume Name: ozone
Type: Distributed-Replicate
Volume ID: dacd299a-23f3-4ab9-a5ac-0cfb26e77223
Status: Created
Snapshot Count: 0
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: 10.70.47.121:/bricks/brick8/ozone_0
Brick2: 10.70.47.113:/bricks/brick8/ozone_1
Brick3: 10.70.47.114:/bricks/brick8/ozone_2
Brick4: 10.70.47.115:/bricks/brick8/ozone_3
Brick5: 10.70.47.116:/bricks/brick8/ozone_4
Brick6: 10.70.47.117:/bricks/brick8/ozone_5
Options Reconfigured:
server.allow-insecure: on
user.cifs: off
features.shard-block-size: 64MB
features.shard: on
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
cluster.data-self-heal-algorithm: full
cluster.quorum-type: auto
cluster.eager-lock: disable
network.remote-dio: disable
performance.strict-o-direct: on
performance.readdir-ahead: off
performance.open-behind: off
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
transport.address-family: inet
nfs.disable: on
cluster.brick-multiplex: enable
[root@dhcp47-117 ~]# rpm -qa | grep gluster
glusterfs-3.8.4-44.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-44.el7rhgs.x86_64
glusterfs-api-3.8.4-44.el7rhgs.x86_64
glusterfs-fuse-3.8.4-44.el7rhgs.x86_64
python-gluster-3.8.4-44.el7rhgs.noarch
libvirt-daemon-driver-storage-gluster-3.2.0-14.el7_4.2.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-client-xlators-3.8.4-44.el7rhgs.x86_64
glusterfs-server-3.8.4-44.el7rhgs.x86_64
glusterfs-rdma-3.8.4-44.el7rhgs.x86_64
vdsm-gluster-4.17.33-1.2.el7rhgs.noarch
gluster-nagios-addons-0.2.9-1.el7rhgs.x86_64
glusterfs-cli-3.8.4-44.el7rhgs.x86_64
glusterfs-libs-3.8.4-44.el7rhgs.x86_64
glusterfs-events-3.8.4-44.el7rhgs.x86_64
gluster-block-0.2.1-11.el7rhgs.x86_64
[root@dhcp47-117 ~]# 
[root@dhcp47-117 ~]# rpm -qa | grep tcmu-runner
tcmu-runner-1.2.0-14.el7rhgs.x86_64
[root@dhcp47-117 ~]#

Comment 11 errata-xmlrpc 2017-09-21 05:06:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774