Bug 1314421 - [HC] Ensure o-direct behaviour when sharding is enabled on volume and files opened with o_direct
[HC] Ensure o-direct behaviour when sharding is enabled on volume and files o...
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: core (Show other bugs)
Unspecified Unspecified
high Severity high
: ---
: RHGS 3.1.3
Assigned To: Krutika Dhananjay
: ZStream
: 1322014 (view as bug list)
Depends On:
Blocks: Gluster-HC-1 1311817 1322214 1325843 1335284 1339136
  Show dependency treegraph
Reported: 2016-03-03 09:56 EST by Sanjay Rao
Modified: 2016-09-17 10:41 EDT (History)
12 users (show)

See Also:
Fixed In Version: glusterfs-3.7.9-9
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1322214 (view as bug list)
Last Closed: 2016-06-23 01:10:23 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Sanjay Rao 2016-03-03 09:56:30 EST
Description of problem:
In a RHEV-RHGS hyperconverged environment, adding disk to VM from a glusterfs storage pool fails when glusterfs is running in posix/directio mode

The gluster volume is configured to run in directIO mode by adding 

option o-direct on 

in the /var/lib/glusterd/vols/gl_01/*.vol files. Example below

volume gl_01-posix
    type storage/posix
    option o-direct on
    option brick-gid 36
    option brick-uid 36
    option volume-id c131155a-d40c-4d9e-b056-26c61b924c26
    option directory /bricks/b01/g

When the option is removed and the volume is restarted, disks can be added to the VM from the glusterfs pool.

Version-Release number of selected component (if applicable):

RHEV version is RHEV 3.6


How reproducible:
Easily reproducible

Steps to Reproduce:
1. Create a GlusterFS storage pool in a RHEV environment 
2. Configure GlusterFS in a posix/directIO mode
3. Create a new VM or add disk to an existing VM. The add disk part fails

Actual results:

Expected results:

Additional info:
Comment 2 Krutika Dhananjay 2016-03-17 08:11:14 EDT
Hi Sanjay,

In light of the recent discussion we had wrt direct-io behavior on a mail thread, I have the following question:

Assuming the 'cache=none' command line option implies that the vm image files will all be opened with O_DIRECT flag (which means that the write buffers will already be aligned with the "sector size of the underlying block device", the only layer in the combined client-server stack that could prevent us from achieving o-direct-like behavior because of caching would be the write-behind translator.

Therefore, I am wondering if it is sufficient to enable 'performance.strict-o-direct' to achieve the behavior you expect to see with o-direct?

Comment 3 Sanjay Rao 2016-03-17 08:20:02 EDT
I have tested with different options. The only option that enabled true directIO on the glusterfs server was the posix setting.

I can verify again with the performance.strict-o-direct with the recent glusterfs version (glusterfs-server-3.7.5-18.33) installed on my system just to be sure.
Comment 4 Krutika Dhananjay 2016-04-11 02:57:35 EDT
Upstream patch at http://review.gluster.org/13846
Moving the state of the bug to POST.
Comment 8 Sahina Bose 2016-04-28 05:17:21 EDT
Moving back to Assigned with comments from Vijay:
The current behavior in sharding is the following:

1. open the base/first shard with O_DIRECT

2. open the subsequent shards without O_DIRECT. All write operations
are converted to write + fsync operations to minimize the usage of
page cache.

With the planned patch, sharding will be opening non-first shards with
O_DIRECT to completely eliminate any usage of page cache.
Comment 9 Sahina Bose 2016-04-28 05:44:31 EDT
*** Bug 1322014 has been marked as a duplicate of this bug. ***
Comment 10 Krutika Dhananjay 2016-05-03 09:30:57 EDT
Comment 14 SATHEESARAN 2016-06-03 06:56:57 EDT
Solution to the VM pause issue as seen in the BZ https://bugzilla.redhat.com/show_bug.cgi?id=1339136 - is to make sharding honor O_DIRECT.

So this bug needs to be proposed for RHGS 3.1.3
Comment 18 Krutika Dhananjay 2016-06-06 07:43:42 EDT
The fix for making individual shards inherit original fd's flags involves changes to management of anon fds (fd.c). Also, one major consumer of anon fds apart from sharding is gluster-NFS. So once this patch lands, it would be good to verify that it doesn't break the existing functionality on NFS. Specifically fd based operations (reads and writes)need to be tested on NFS mounts to ensure they work fine. In this regard, it would be good to also use fd flags like O_DIRECT, O_SYNC, O_DSYNC from the application.

Comment 19 Atin Mukherjee 2016-06-07 06:20:05 EDT
All the required patches are pulled into downstream now:

> http://review.gluster.org/14271
> http://review.gluster.org/10219
> http://review.gluster.org/14215
> http://review.gluster.org/14191
> http://review.gluster.org/14639
> http://review.gluster.org/14623

Moving the state to Modified
Comment 21 SATHEESARAN 2016-06-13 05:11:29 EDT
Tested with RHGS 3.1.3 build - glusterfs-3.7.9-10.el7rhgs with the following tests.

1. Created a replica 3 volume
2. Disabled remote-dio and enabled strict-o-direct on the volume
3. Created a RHEV data domain backed by the above created volume
4. Did 'strace' on the brick process while 100% write workload is happening with fio.

all the shards are opened with O_DIRECT as expected

./.30489:open("/rhgs/brick1/vmb1/.glusterfs/aa/59/aa59991a-31b5-41e6-87e5-e83e7bd4a082", O_RDWR|O_DIRECT) = 115 <0.000023>
./.30489:open("/rhgs/brick1/vmb1/.glusterfs/1e/d4/1ed4977a-3c75-459d-af88-a21d50190bd3", O_RDWR|O_DIRECT) = 115 <0.000019>
./.30489:open("/rhgs/brick1/vmb1/.glusterfs/aa/59/aa59991a-31b5-41e6-87e5-e83e7bd4a082", O_RDWR|O_DIRECT) = 114 <0.000025>
./.30489:open("/rhgs/brick1/vmb1/.glusterfs/aa/59/aa59991a-31b5-41e6-87e5-e83e7bd4a082", O_RDWR|O_DIRECT) = 114 <0.000020>
./.30489:open("/rhgs/brick1/vmb1/.glusterfs/aa/59/aa59991a-31b5-41e6-87e5-e83e7bd4a082", O_RDWR) = 114 <0.000020>
./.30489:open("/rhgs/brick1/vmb1/.glusterfs/aa/59/aa59991a-31b5-41e6-87e5-e83e7bd4a082", O_RDWR|O_DIRECT) = 114 <0.000035>
./.30489:open("/rhgs/brick1/vmb1/.glusterfs/aa/59/aa59991a-31b5-41e6-87e5-e83e7bd4a082", O_RDWR|O_DIRECT) = 115 <0.000023>
./.30489:open("/rhgs/brick1/vmb1/.glusterfs/aa/59/aa59991a-31b5-41e6-87e5-e83e7bd4a082", O_RDWR|O_DIRECT) = 115 <0.000023>
./.30489:open("/rhgs/brick1/vmb1/.glusterfs/aa/59/aa59991a-31b5-41e6-87e5-e83e7bd4a082", O_RDWR) = 114 <0.000024>
./.30489:openat(AT_FDCWD, "/rhgs/brick1/vmb1/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 115 <0.000027>
./.30489:openat(AT_FDCWD, "/rhgs/brick1/vmb1/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 115 <0.000020>
./.30489:open("/rhgs/brick1/vmb1/.glusterfs/aa/59/aa59991a-31b5-41e6-87e5-e83e7bd4a082", O_RDWR|O_DIRECT) = 115 <0.000027>
Comment 23 errata-xmlrpc 2016-06-23 01:10:23 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.