Bug 1375959 - Files not being opened with o_direct flag during random read operation (Glusterfs 3.8.2)
Summary: Files not being opened with o_direct flag during random read operation (Glust...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: 3.8
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Krutika Dhananjay
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1377556 1378695 1378814 1380638
TreeView+ depends on / blocked
 
Reported: 2016-09-14 10:49 UTC by Shekhar Berry
Modified: 2016-10-20 14:03 UTC (History)
11 users (show)

Fixed In Version: glusterfs-3.8.5
Clone Of:
: 1377556 1378695 1378814 (view as bug list)
Environment:
Last Closed: 2016-10-20 14:03:22 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
Vmstat output for duration of random read test (11.71 KB, text/plain)
2016-09-14 10:49 UTC, Shekhar Berry
no flags Details

Description Shekhar Berry 2016-09-14 10:49:12 UTC
Created attachment 1200788 [details]
Vmstat output for duration of random read test

Description of problem:

Hi,

I am running random workload test using pbench_fio and passing Direct=1 flag to bypass the cache. On the server side I have performance.strict-o-direct: on.

The expectation was that for all workload no activity would be seen on server cache. On the contrary I am seeing full 48G of server memory is being utilized for Random reads workload.

I collected strace on the brick process and saw no files opened with o_direct flag. Also vmstat data collected on server shows full utilization of 48G of server memory.

I have 6 node server setup and running workload on them from 5 clients. I have replica 2 volume. Here is vol info:

Volume Name: rep2
Type: Distributed-Replicate
Volume ID: 08bc410b-fba4-4a81-b918-fe5239947eef
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: 172.17.40.13:/bricks/b01/g
Brick2: 172.17.40.14:/bricks/b01/g
Brick3: 172.17.40.15:/bricks/b01/g
Brick4: 172.17.40.16:/bricks/b01/g
Brick5: 172.17.40.22:/bricks/b01/g
Brick6: 172.17.40.24:/bricks/b01/g
Options Reconfigured:
performance.quick-read: off
performance.open-behind: off
performance.strict-o-direct: on
transport.address-family: inet
performance.readdir-ahead: on


Version-Release number of selected component (if applicable):

On server side:

rpm -qa | grep gluster
glusterfs-cli-3.8.2-1.el7.x86_64
glusterfs-3.8.2-1.el7.x86_64
glusterfs-api-3.8.2-1.el7.x86_64
glusterfs-libs-3.8.2-1.el7.x86_64
glusterfs-fuse-3.8.2-1.el7.x86_64
glusterfs-client-xlators-3.8.2-1.el7.x86_64
glusterfs-server-3.8.2-1.el7.x86_64

On client side:

rpm -qa | grep gluster
glusterfs-client-xlators-3.8.2-1.el7.x86_64
glusterfs-libs-3.8.2-1.el7.x86_64
glusterfs-3.8.2-1.el7.x86_64
glusterfs-fuse-3.8.2-1.el7.x86_64


How reproducible:

Its reproducible every-time

Steps to Reproduce:
1. Create a volume and set performance.strict-o-direct: on
2. Run random read workload using fio or any other equivalent workload generator
3. Observe vmstat and collect strace output using strace -ff -T -p <pid> -o <path-where-you-want-the-logs-captured>. To get PID ps aux | grep glusterfsd | grep <volname>

Actual results:

Full 48G of cache memory is being used.

Expected results:

With performance.strict-o-direct: on being set, expectation was it would bypass the cache completely.

Additional info:

Output of strace (See no file is being opened with o_direct flag)

 grep -i "o_direct" strace.log.*
strace.log.13755:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 31 <0.000063>
strace.log.13755:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/dirty", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 31 <0.000013>
strace.log.13755:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/entry-changes", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 31 <0.000013>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 31 <0.000118>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 31 <0.000114>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 31 <0.000142>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 31 <0.000141>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 31 <0.000141>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 31 <0.000141>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 28 <0.000023>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 28 <0.000089>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 28 <0.000058>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000094>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 26 <0.000104>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000028>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000099>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000018>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 26 <0.000106>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000028>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 29 <0.000077>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 29 <0.000074>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 29 <0.000119>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000028>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 28 <0.000036>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 28 <0.000028>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000109>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000028>
strace.log.13789:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000029>
strace.log.13789:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000015>
strace.log.13789:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 26 <0.000015>
strace.log.13789:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 28 <0.000027>
strace.log.13789:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 29 <0.000050>
strace.log.13789:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 30 <0.000034>
strace.log.13789:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000018>
strace.log.13789:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 26 <0.000128>
strace.log.14342:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000014>
strace.log.14342:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000028>
strace.log.14342:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000014>
strace.log.14342:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000038>
strace.log.14342:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 26 <0.000111>
strace.log.14342:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000014>
strace.log.14342:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000017>
strace.log.14342:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 28 <0.000025>
strace.log.14342:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 29 <0.000028>
strace.log.14342:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000026>
strace.log.14342:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 28 <0.000075>
strace.log.14343:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 28 <0.000022>
strace.log.14343:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 29 <0.000013>
strace.log.14343:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000013>
strace.log.14343:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 29 <0.000025>
strace.log.14343:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 29 <0.000021>
strace.log.14343:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 29 <0.000022>
strace.log.14343:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000016>
strace.log.14343:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 26 <0.000080>
strace.log.14343:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 28 <0.000027>
strace.log.14343:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 28 <0.000018>
strace.log.14358:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000022>
strace.log.14358:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 28 <0.000026>
strace.log.14358:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000017>
strace.log.14359:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000014>

Comment 2 Atin Mukherjee 2016-09-14 17:28:19 UTC
Shekhar - please change the product of this BZ to GlusterFS as the testing was done with glusterfs 3.8.2 bits.

Comment 3 Pranith Kumar K 2016-09-14 17:39:06 UTC
Shekhar,
   You need the following two options for O_DIRECT to be handled properly:

1) performance.strict-o-direct on
2) network.remote-dio off

Pranith

Comment 4 Shekhar Berry 2016-09-14 17:45:06 UTC
Pranith,

To enable o_direct I turned performance.strict-o-direct on and my network.remote-dio was disabled by default. 

performance.quick-read: off
performance.open-behind: off

are two options which I tried as part of troubleshooting exercise with Krutika and Raghavendra.

I am reopening this issue.

--Shekhar

Comment 5 Krutika Dhananjay 2016-09-14 18:17:53 UTC
(In reply to Pranith Kumar K from comment #3)
> Shekhar,
>    You need the following two options for O_DIRECT to be handled properly:
> 
> 1) performance.strict-o-direct on
> 2) network.remote-dio off
> 
> Pranith

@Pranith - Part of the problem seems to be that the anonymous fds created by open-behind don't inherit the original open() flags. And when ob does wind reads and writes on these anon fds, posix winds up using the usual GF_ANON_FD_FLAGS (O_RDWR|O_LARGEFILE) value at the time of open() preceding the invoked fd operation (mostly read/write).

FWIW, I did try the same test Shekhar ran on my test cluster and disabling open-behind in addition to the two already known o-direct options that you mentioned above, seemed to fix the issue of growing cache size as the test progresses (although it didn't have any effect as per Shekhar when he tried the same).

I am yet to confirm the theory (and the presence/absence of O_DIRECT flag at the level of posix) through strace output of the bricks with and without open-behind. I will do the same tomorrow and update the bug.

-Krutika

Comment 6 Pranith Kumar K 2016-09-14 19:26:23 UTC
(In reply to Shekhar Berry from comment #4)
> Pranith,
> 
> To enable o_direct I turned performance.strict-o-direct on and my
> network.remote-dio was disabled by default. 
> 
> performance.quick-read: off
> performance.open-behind: off
> 
> are two options which I tried as part of troubleshooting exercise with
> Krutika and Raghavendra.
> 
> I am reopening this issue.
> 
> --Shekhar

I was so confident this is not a bug, sorry about closing the bug before coming to a conclusion :-(

Comment 7 Pranith Kumar K 2016-09-14 19:30:03 UTC
(In reply to Krutika Dhananjay from comment #5)
> (In reply to Pranith Kumar K from comment #3)
> > Shekhar,
> >    You need the following two options for O_DIRECT to be handled properly:
> > 
> > 1) performance.strict-o-direct on
> > 2) network.remote-dio off
> > 
> > Pranith
> 
> @Pranith - Part of the problem seems to be that the anonymous fds created by
> open-behind don't inherit the original open() flags. And when ob does wind
> reads and writes on these anon fds, posix winds up using the usual
> GF_ANON_FD_FLAGS (O_RDWR|O_LARGEFILE) value at the time of open() preceding
> the invoked fd operation (mostly read/write).

As soon as the first write comes, an OPEN FOP with same flags that came at the time of open() are sent out from open-behind, from then on even the reads are served using this fd. Could you confirm if the test includes only reads and no writes at all? In that case we have a bug we need to look into.

> 
> FWIW, I did try the same test Shekhar ran on my test cluster and disabling
> open-behind in addition to the two already known o-direct options that you
> mentioned above, seemed to fix the issue of growing cache size as the test
> progresses (although it didn't have any effect as per Shekhar when he tried
> the same).
> 
> I am yet to confirm the theory (and the presence/absence of O_DIRECT flag at
> the level of posix) through strace output of the bricks with and without
> open-behind. I will do the same tomorrow and update the bug.

Yes, this would be important to know.

> 
> -Krutika

Comment 8 Krutika Dhananjay 2016-09-15 13:28:42 UTC
(In reply to Krutika Dhananjay from comment #5)
> (In reply to Pranith Kumar K from comment #3)
> > Shekhar,
> >    You need the following two options for O_DIRECT to be handled properly:
> > 
> > 1) performance.strict-o-direct on
> > 2) network.remote-dio off
> > 
> > Pranith
> 
> @Pranith - Part of the problem seems to be that the anonymous fds created by
> open-behind don't inherit the original open() flags. And when ob does wind
> reads and writes on these anon fds, posix winds up using the usual
> GF_ANON_FD_FLAGS (O_RDWR|O_LARGEFILE) value at the time of open() preceding
> the invoked fd operation (mostly read/write).
> 
> FWIW, I did try the same test Shekhar ran on my test cluster and disabling
> open-behind in addition to the two already known o-direct options that you
> mentioned above, seemed to fix the issue of growing cache size as the test
> progresses (although it didn't have any effect as per Shekhar when he tried
> the same).
> 
> I am yet to confirm the theory (and the presence/absence of O_DIRECT flag at
> the level of posix) through strace output of the bricks with and without
> open-behind. I will do the same tomorrow and update the bug.
> 
> -Krutika

I stand corrected. *Even* with open-behind disabled, like Shekhar rightly said, the issue still exists. I even disabled all perf xls and the problem still persists. I found that open()s are being issued on gfid handles with O_RDONLY|O_DIRECT flags in the bricks' strace output with all these xls disabled. I still need to check if pread()s are being issued on the same file descriptors though.

Still have some more strace analysis to do. Will update the bug again once the issue is RC'd.

-Krutika

Comment 9 Krutika Dhananjay 2016-09-15 16:04:21 UTC
So I checked the strace output again. The moment I disable open-behind, I start to see O_DIRECT in the strace logs. I grepped for other invocations of open() and realised there are all these files opened with O_WRONLY flag (without O_DIRECT of course) with a lot of pwrites being invoked on their fds. Took me a while to realise that these are filenames with patterns fio_bw*.log, fio_clat*.log, fio_lat*.log, etc - in other words log files created by fio itself which were ending up on the glusterfs mount point.

Possible dumb question inbound - Shekhar, did you invoke the fio command when your current working directory was the glusterfs mount point itself? I made that mistake today which is why I saw the same results as you did. :)

Could you please confirm this?

I ran the test again now, this time from $HOME. Disabling open-behind caused things to work for me.

The bug report however is still valid, I will send out a patch to make open-behind handle O_DIRECT. Meanwhile you can work around the issue by disabling open-behind for now.

-Krutika

Comment 10 Shekhar Berry 2016-09-15 18:30:27 UTC
Krutika,

Thanks for looking into the bug.

Here are couple of my comments:

1) did you invoke the fio command when your current working directory was the glusterfs mount point itself?

No, I am using pbench_fio for my testing and it stores run logs under /var/lib/pbench-agent. Just FYI, I am not invoking pbench_fio from gluster mount point itself :)

2) I redid the test with performance.open-behind: off and performance.strict-o-direct: on option. The results are no different in my setup.I am still seeing full cache utilization. 

I have not captured strace under this setting. I can capture and share that with you if required.

--Shekhar

Comment 11 Krutika Dhananjay 2016-09-16 05:12:36 UTC
Yes, strace output would be great.

1. Could you capture the strace logs of the bricks (the smaller the brick count, the easier it would be for me :) ) for the following cases:

i. strict-o-direct on, and everything else remains default (implying that open-behind will also be on)

ii. strict-o-direct on and open-behind turned off

iii. strict-o-direct on and all perf xls disabled (write-behind, open-behind, quick-read, read-ahead, io-cache, stat-prefetch).

For capturing the strace output, you can use this:

# strace -ff -f -p <pid-of-brick> -o <path-where-you-want-the-output-captured>

Not that you may have to hit Ctrl+C on the strace terminal after every run.



2. Along with 1, also start the test script (fio in your case) itself using strace. For this, you need to run execute the following:

# strace -ff -f -o fio-strace.log <your-test-invocation-command-here>

-Krutika

Comment 12 Shekhar Berry 2016-09-17 07:54:25 UTC
I stand corrected. With 'Open-behind' off I am seeing files are now open with o_direct flag for both write and read test.

I redid the test with following volume setting:

performance.open-behind: off
performance.strict-o-direct: on

Here's the strace output:

grep -i "o_direct" strace.log.* | egrep -v openat

strace.log.25363:open("/bricks/b01/g/.glusterfs/ab/2e/ab2ea2ca-1f6e-494a-b817-faec8627afe3", O_RDWR|O_DIRECT) = 33 <0.000014>
strace.log.25363:open("/bricks/b01/g/.glusterfs/10/f7/10f7cf7d-bd68-4414-a3c8-cb3b12092452", O_RDWR|O_DIRECT) = 35 <0.000016>
strace.log.25363:open("/bricks/b01/g/.glusterfs/51/ac/51acd630-6518-42f8-a9cf-c936dc5c3b9c", O_RDWR|O_DIRECT) = 37 <0.000011>
strace.log.25363:open("/bricks/b01/g/.glusterfs/a1/41/a141600d-8fbb-41f9-b0b8-34318125839e", O_RDWR|O_DIRECT) = 39 <0.000017>
strace.log.25363:open("/bricks/b01/g/.glusterfs/fa/cd/facda6a3-53c4-484e-a599-2488d0683c31", O_RDWR|O_DIRECT) = 43 <0.000039>
strace.log.25363:open("/bricks/b01/g/.glusterfs/52/02/5202698e-51d2-40d3-9a61-9b40ce2b427b", O_RDWR|O_DIRECT) = 44 <0.000014>
strace.log.25363:open("/bricks/b01/g/.glusterfs/ac/2c/ac2ceda0-d035-4b94-87bc-c12bced59c6c", O_RDONLY|O_DIRECT) = 45 <0.000040>
strace.log.25363:open("/bricks/b01/g/.glusterfs/51/ac/51acd630-6518-42f8-a9cf-c936dc5c3b9c", O_RDONLY|O_DIRECT) = 52 <0.000021>
strace.log.25454:open("/bricks/b01/g/.glusterfs/ca/47/ca475b51-9f61-4838-ba2c-8930b65a0c04", O_RDWR|O_DIRECT) = 34 <0.000019>
strace.log.25454:open("/bricks/b01/g/.glusterfs/11/d7/11d7083f-a4d0-41c2-9b48-efbd899b831a", O_RDWR|O_DIRECT) = 36 <0.000017>
strace.log.25454:open("/bricks/b01/g/.glusterfs/84/63/8463c0d7-3e68-44c7-b360-0a6622070c42", O_RDWR|O_DIRECT) = 38 <0.000017>
strace.log.25454:open("/bricks/b01/g/.glusterfs/9a/4c/9a4cb41f-487d-464f-9238-9db7385ecf53", O_RDWR|O_DIRECT) = 40 <0.000012>
strace.log.25454:open("/bricks/b01/g/.glusterfs/30/f9/30f929ce-9b66-4d3e-ab0f-a73c60d4d817", O_RDWR|O_DIRECT) = 41 <0.000013>
strace.log.25454:open("/bricks/b01/g/.glusterfs/85/50/8550a796-172b-4390-b976-b228eafd9ada", O_RDWR|O_DIRECT) = 42 <0.000016>
strace.log.25454:open("/bricks/b01/g/.glusterfs/9d/b2/9db21c69-39bc-4565-8793-0833cf21887d", O_RDWR|O_DIRECT) = 45 <0.000020>
strace.log.25454:open("/bricks/b01/g/.glusterfs/ac/2c/ac2ceda0-d035-4b94-87bc-c12bced59c6c", O_RDWR|O_DIRECT) = 48 <0.000026>
strace.log.25454:open("/bricks/b01/g/.glusterfs/a5/b6/a5b62db9-0d1a-49cb-b188-a1be216de017", O_RDWR|O_DIRECT) = 50 <0.000021>
strace.log.25454:open("/bricks/b01/g/.glusterfs/c1/fb/c1fb470d-63dc-4d1e-8079-93877bdcb2a5", O_RDONLY|O_DIRECT) = 36 <0.000026>
strace.log.25454:open("/bricks/b01/g/.glusterfs/11/d7/11d7083f-a4d0-41c2-9b48-efbd899b831a", O_RDONLY|O_DIRECT) = 48 <0.000030>
strace.log.25455:open("/bricks/b01/g/.glusterfs/81/04/810408fe-eb40-42e3-b53b-981dbf290af3", O_RDWR|O_DIRECT) = 49 <0.000037>
strace.log.25455:open("/bricks/b01/g/.glusterfs/c1/fb/c1fb470d-63dc-4d1e-8079-93877bdcb2a5", O_RDWR|O_DIRECT) = 52 <0.000013>
strace.log.25455:open("/bricks/b01/g/.glusterfs/30/f9/30f929ce-9b66-4d3e-ab0f-a73c60d4d817", O_RDONLY|O_DIRECT) = 33 <0.000022>
strace.log.25455:open("/bricks/b01/g/.glusterfs/9d/b2/9db21c69-39bc-4565-8793-0833cf21887d", O_RDONLY|O_DIRECT) = 35 <0.000032>
strace.log.25455:open("/bricks/b01/g/.glusterfs/84/63/8463c0d7-3e68-44c7-b360-0a6622070c42", O_RDONLY|O_DIRECT) = 49 <0.000027>
strace.log.25456:open("/bricks/b01/g/.glusterfs/2d/35/2d352297-ab44-4c92-8b1d-cb314f5f693b", O_RDWR|O_DIRECT) = 51 <0.000029>
strace.log.25456:open("/bricks/b01/g/.glusterfs/fa/cd/facda6a3-53c4-484e-a599-2488d0683c31", O_RDONLY|O_DIRECT) = 34 <0.000029>
strace.log.25457:open("/bricks/b01/g/.glusterfs/e5/13/e5134455-eb71-43b8-8662-355c79d5ef1a", O_RDWR|O_DIRECT) = 46 <0.000018>
strace.log.25457:open("/bricks/b01/g/.glusterfs/49/3b/493b7a5b-42f7-4c0a-9a4c-48956a05444f", O_RDWR|O_DIRECT) = 47 <0.000027>
strace.log.25457:open("/bricks/b01/g/.glusterfs/81/04/810408fe-eb40-42e3-b53b-981dbf290af3", O_RDONLY|O_DIRECT) = 47 <0.000031>
strace.log.25458:open("/bricks/b01/g/.glusterfs/52/02/5202698e-51d2-40d3-9a61-9b40ce2b427b", O_RDONLY|O_DIRECT) = 39 <0.000026>
strace.log.25458:open("/bricks/b01/g/.glusterfs/ab/2e/ab2ea2ca-1f6e-494a-b817-faec8627afe3", O_RDONLY|O_DIRECT) = 46 <0.000037>
strace.log.25458:open("/bricks/b01/g/.glusterfs/ca/47/ca475b51-9f61-4838-ba2c-8930b65a0c04", O_RDONLY|O_DIRECT) = 51 <0.000030>
strace.log.25459:open("/bricks/b01/g/.glusterfs/e5/13/e5134455-eb71-43b8-8662-355c79d5ef1a", O_RDONLY|O_DIRECT) = 40 <0.000017>
strace.log.25459:open("/bricks/b01/g/.glusterfs/10/f7/10f7cf7d-bd68-4414-a3c8-cb3b12092452", O_RDONLY|O_DIRECT) = 42 <0.000014>
strace.log.25459:open("/bricks/b01/g/.glusterfs/49/3b/493b7a5b-42f7-4c0a-9a4c-48956a05444f", O_RDONLY|O_DIRECT) = 43 <0.000044>
strace.log.25460:open("/bricks/b01/g/.glusterfs/85/50/8550a796-172b-4390-b976-b228eafd9ada", O_RDONLY|O_DIRECT) = 37 <0.000012>
strace.log.25461:open("/bricks/b01/g/.glusterfs/a5/b6/a5b62db9-0d1a-49cb-b188-a1be216de017", O_RDONLY|O_DIRECT) = 38 <0.000039>
strace.log.25461:open("/bricks/b01/g/.glusterfs/9a/4c/9a4cb41f-487d-464f-9238-9db7385ecf53", O_RDONLY|O_DIRECT) = 44 <0.000027>
strace.log.25461:open("/bricks/b01/g/.glusterfs/a1/41/a141600d-8fbb-41f9-b0b8-34318125839e", O_RDONLY|O_DIRECT) = 50 <0.000015>
strace.log.25462:open("/bricks/b01/g/.glusterfs/2d/35/2d352297-ab44-4c92-8b1d-cb314f5f693b", O_RDONLY|O_DIRECT) = 41 <0.000017>


--Shekhar

Comment 13 Worker Ant 2016-09-23 10:04:16 UTC
REVIEW: http://review.gluster.org/15552 (performance/open-behind: Pass O_DIRECT flags for anon fd reads when required) posted (#1) for review on release-3.8 by Krutika Dhananjay (kdhananj)

Comment 14 Worker Ant 2016-09-26 04:42:28 UTC
COMMIT: http://review.gluster.org/15552 committed in release-3.8 by Raghavendra G (rgowdapp) 
------
commit 2164d3fbf7301c8db8eaa3a6a37ab06225473664
Author: Krutika Dhananjay <kdhananj>
Date:   Tue Sep 20 12:05:23 2016 +0530

    performance/open-behind: Pass O_DIRECT flags for anon fd reads when required
    
            Backport of: http://review.gluster.org/15537
            cherry-picked from a412a4f50d8ca2ae68dbfa93b80757889150ce99
    
    Writes are already passing the correct flags at the time of open().
    
    Also, make io-cache honor direct-io for anon-fds with
    O_DIRECT flag during reads.
    
    Change-Id: I9eb89c3bda34f9287861eb3b53c3d6a7b967c105
    BUG: 1375959
    Signed-off-by: Krutika Dhananjay <kdhananj>
    Reviewed-on: http://review.gluster.org/15552
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>

Comment 15 Niels de Vos 2016-10-20 14:03:22 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.5, please open a new bug report.

glusterfs-3.8.5 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/announce/2016-October/000061.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.