1314421 – [HC] Ensure o-direct behaviour when sharding is enabled on volume and files opened with o_direct

Bug 1314421 - [HC] Ensure o-direct behaviour when sharding is enabled on volume and files opened with o_direct

Summary: [HC] Ensure o-direct behaviour when sharding is enabled on volume and files o...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	core
Sub Component:
Version:	rhgs-3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.1.3
Assignee:	Krutika Dhananjay
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1322014 (view as bug list)
Depends On:
Blocks:	Gluster-HC-1 1311817 1322214 1325843 1335284 1339136
TreeView+	depends on / blocked

Reported:	2016-03-03 14:56 UTC by Sanjay Rao
Modified:	2016-09-17 14:41 UTC (History)
CC List:	12 users (show)
Fixed In Version:	glusterfs-3.7.9-9
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1322214 (view as bug list)
Environment:
Last Closed:	2016-06-23 05:10:23 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:1240	0	normal	SHIPPED_LIVE	Red Hat Gluster Storage 3.1 Update 3	2016-06-23 08:51:28 UTC

Description Sanjay Rao 2016-03-03 14:56:30 UTC

Description of problem:
In a RHEV-RHGS hyperconverged environment, adding disk to VM from a glusterfs storage pool fails when glusterfs is running in posix/directio mode

The gluster volume is configured to run in directIO mode by adding 

option o-direct on 

in the /var/lib/glusterd/vols/gl_01/*.vol files. Example below

volume gl_01-posix
    type storage/posix
    option o-direct on
    option brick-gid 36
    option brick-uid 36
    option volume-id c131155a-d40c-4d9e-b056-26c61b924c26
    option directory /bricks/b01/g
end-volume

When the option is removed and the volume is restarted, disks can be added to the VM from the glusterfs pool.


Version-Release number of selected component (if applicable):

RHEV version is RHEV 3.6

glusterfs-client-xlators-3.7.5-11.el7rhgs.x86_64
glusterfs-cli-3.7.5-11.el7rhgs.x86_64
glusterfs-libs-3.7.5-11.el7rhgs.x86_64
glusterfs-3.7.5-11.el7rhgs.x86_64
glusterfs-api-3.7.5-11.el7rhgs.x86_64
glusterfs-fuse-3.7.5-11.el7rhgs.x86_64
glusterfs-server-3.7.5-11.el7rhgs.x86_64



How reproducible:
Easily reproducible

Steps to Reproduce:
1. Create a GlusterFS storage pool in a RHEV environment 
2. Configure GlusterFS in a posix/directIO mode
3. Create a new VM or add disk to an existing VM. The add disk part fails

Actual results:


Expected results:


Additional info:

Comment 2 Krutika Dhananjay 2016-03-17 12:11:14 UTC

Hi Sanjay,

In light of the recent discussion we had wrt direct-io behavior on a mail thread, I have the following question:

Assuming the 'cache=none' command line option implies that the vm image files will all be opened with O_DIRECT flag (which means that the write buffers will already be aligned with the "sector size of the underlying block device", the only layer in the combined client-server stack that could prevent us from achieving o-direct-like behavior because of caching would be the write-behind translator.

Therefore, I am wondering if it is sufficient to enable 'performance.strict-o-direct' to achieve the behavior you expect to see with o-direct?

-Krutika

Comment 3 Sanjay Rao 2016-03-17 12:20:02 UTC

I have tested with different options. The only option that enabled true directIO on the glusterfs server was the posix setting.

I can verify again with the performance.strict-o-direct with the recent glusterfs version (glusterfs-server-3.7.5-18.33) installed on my system just to be sure.

Comment 4 Krutika Dhananjay 2016-04-11 06:57:35 UTC

Upstream patch at http://review.gluster.org/13846
Moving the state of the bug to POST.

Comment 8 Sahina Bose 2016-04-28 09:17:21 UTC

Moving back to Assigned with comments from Vijay:
The current behavior in sharding is the following:

1. open the base/first shard with O_DIRECT

2. open the subsequent shards without O_DIRECT. All write operations
are converted to write + fsync operations to minimize the usage of
page cache.

With the planned patch, sharding will be opening non-first shards with
O_DIRECT to completely eliminate any usage of page cache.

Comment 9 Sahina Bose 2016-04-28 09:44:31 UTC

*** Bug 1322014 has been marked as a duplicate of this bug. ***

Comment 10 Krutika Dhananjay 2016-05-03 13:30:57 UTC

http://review.gluster.org/#/c/14191/

Comment 14 SATHEESARAN 2016-06-03 10:56:57 UTC

Solution to the VM pause issue as seen in the BZ https://bugzilla.redhat.com/show_bug.cgi?id=1339136 - is to make sharding honor O_DIRECT.

So this bug needs to be proposed for RHGS 3.1.3

Comment 18 Krutika Dhananjay 2016-06-06 11:43:42 UTC

The fix for making individual shards inherit original fd's flags involves changes to management of anon fds (fd.c). Also, one major consumer of anon fds apart from sharding is gluster-NFS. So once this patch lands, it would be good to verify that it doesn't break the existing functionality on NFS. Specifically fd based operations (reads and writes)need to be tested on NFS mounts to ensure they work fine. In this regard, it would be good to also use fd flags like O_DIRECT, O_SYNC, O_DSYNC from the application.

-Krutika

Comment 19 Atin Mukherjee 2016-06-07 10:20:05 UTC

All the required patches are pulled into downstream now:

> http://review.gluster.org/14271
> http://review.gluster.org/10219
> http://review.gluster.org/14215
> http://review.gluster.org/14191
> http://review.gluster.org/14639
> http://review.gluster.org/14623

Moving the state to Modified

Comment 21 SATHEESARAN 2016-06-13 09:11:29 UTC

Tested with RHGS 3.1.3 build - glusterfs-3.7.9-10.el7rhgs with the following tests.

1. Created a replica 3 volume
2. Disabled remote-dio and enabled strict-o-direct on the volume
3. Created a RHEV data domain backed by the above created volume
4. Did 'strace' on the brick process while 100% write workload is happening with fio.

all the shards are opened with O_DIRECT as expected

./.30489:open("/rhgs/brick1/vmb1/.glusterfs/aa/59/aa59991a-31b5-41e6-87e5-e83e7bd4a082", O_RDWR|O_DIRECT) = 115 <0.000023>
./.30489:open("/rhgs/brick1/vmb1/.glusterfs/1e/d4/1ed4977a-3c75-459d-af88-a21d50190bd3", O_RDWR|O_DIRECT) = 115 <0.000019>
./.30489:open("/rhgs/brick1/vmb1/.glusterfs/aa/59/aa59991a-31b5-41e6-87e5-e83e7bd4a082", O_RDWR|O_DIRECT) = 114 <0.000025>
./.30489:open("/rhgs/brick1/vmb1/.glusterfs/aa/59/aa59991a-31b5-41e6-87e5-e83e7bd4a082", O_RDWR|O_DIRECT) = 114 <0.000020>
./.30489:open("/rhgs/brick1/vmb1/.glusterfs/aa/59/aa59991a-31b5-41e6-87e5-e83e7bd4a082", O_RDWR) = 114 <0.000020>
./.30489:open("/rhgs/brick1/vmb1/.glusterfs/aa/59/aa59991a-31b5-41e6-87e5-e83e7bd4a082", O_RDWR|O_DIRECT) = 114 <0.000035>
./.30489:open("/rhgs/brick1/vmb1/.glusterfs/aa/59/aa59991a-31b5-41e6-87e5-e83e7bd4a082", O_RDWR|O_DIRECT) = 115 <0.000023>
./.30489:open("/rhgs/brick1/vmb1/.glusterfs/aa/59/aa59991a-31b5-41e6-87e5-e83e7bd4a082", O_RDWR|O_DIRECT) = 115 <0.000023>
./.30489:open("/rhgs/brick1/vmb1/.glusterfs/aa/59/aa59991a-31b5-41e6-87e5-e83e7bd4a082", O_RDWR) = 114 <0.000024>
./.30489:openat(AT_FDCWD, "/rhgs/brick1/vmb1/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 115 <0.000027>
./.30489:openat(AT_FDCWD, "/rhgs/brick1/vmb1/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 115 <0.000020>
./.30489:open("/rhgs/brick1/vmb1/.glusterfs/aa/59/aa59991a-31b5-41e6-87e5-e83e7bd4a082", O_RDWR|O_DIRECT) = 115 <0.000027>

Comment 23 errata-xmlrpc 2016-06-23 05:10:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240

Note You need to log in before you can comment on or make changes to this bug.